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Introduction 


According to the 2019 World Trade Report, service trades are likely to increase 
their share of global trade by 50 percent until 2040. Services will benefit most 
likely from increasing the automatization and digitalization of former face-to-face 
processes, and from an increasing demand of online services due to demographic 
change. The WTO states that global cooperation has to be increased such that all 
economies can collectively benefit from increasing service trade. 

With the globalization of services comes a globalization of knowledge. Accor- 
ding to the Research Perspectives of the Max Planck Society, globalization is a 
nonlinear process, which can lead not only to homogeneity and the standardiza- 
tion of culture, but also to an increase in complexity, as tools and ideas tend to 
outpace cultural progress. As face-to-face problem-solving will be replaced more 
and more by digital services, global problems that require global cooperation will 
have to gain competence in global and complex problem-solving (CPS). 

A prominent example of such a complex, global problem is anthropogenic cli- 
mate change. The Intergovernmental Panel on Climate Change (IPCC) challenges 
the high imponderability of climate change and its impact on decision-making 
and policies with their “Integrated Risk and Uncertainty Assessment of Climate 
Change Response Policies”. In their report, the IPCC states the understanding that 
decision-makers tend to rather base their decisions on intuitive thinking processes 
than on thorough analysis and that the perception of risk has to be included in 
climate change risk management (Kunreuther et al., 2014). 

Human decision makers are led not only by rational decision-making, but 
insights derived from behavioral economics show that people are guided by int- 
rinsic motives, bias, and myopic interpretations of feedback—casting doubt on 
whether humanity is capable of effectively solving complex problems of global 
proportions. 
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2 1 Introduction 


With growing successes in the area of artificial intelligence (AI), the United 
Nations Economic and Social Council has stated concerns that AI may not only 
offer advantages, but also 


“disrupt societies in fundamental ways”, 


with people being replaced by automated decision-making devices (United 
Nations Economic and Social Council, 2019, p. 5). For this reason, talent search 
is of crucial importance to support domains threatened to be replaced by artificial 
systems. The UNO High-level Committee on Management places a focus on the 
identification of talent by automated processes in the area of assessment and tes- 
ting (United Nations Economic and Social Council, 2019). The hybrid approach 
of embedding expert knowledge into neural networks, commonly used for AI 
systems, has been suggested and implemented through the combined effort of 
various institutes (Barca, Porcu, Bruno, & Passarella, 2017; Chattha et al., 2012; 
Silva & Gombolay, 2019), raising questions regarding the accountability and 
regulation of such AI-guided decisions (Doshi-velez & Kortz, 2017). Since the 
global-employment-changing economic crisis in 2008, the creation of sustainable 
employment has become a core goal for European institutions, such as the Euro- 
pean Foundation for the Improvement of Living and Working Conditions. For 
systems to act sustainably, they must be flexible and resilient, while knowledge 
about a system’s state is key (Jeschke & Mahnke, 2013). The European Com- 
mission further increased flexibility of the European “Stability and Growth Pact” 
in 2015, to “build up fiscal buffers” for its member states; these buffers were 
indeed implemented successfully, according to a 2018 report by the European 
Commission (European Commission, 2018). 

The search for expert knowledge is guided not only by ethics. In trying to gain 
knowledge of a system as large and complex as the European market, obtaining 
sufficient amounts of empirical data can be a challenge. Expert knowledge can be 
used to replace missing data in order to support sound predictions. With highly 
complex problems comes uncertainty, especially when empirical data is limited. 
Psychological observations have shown that expert knowledge tends to be biased, 
when expert knowledge faces uncertainty unguided (European Food and Safety 
Authority, 2014). Expert identification and management have been suggested by 
the European Food Safety Authority to be organized in a structural manner, and 
should result in a database of experts. The “Division for Sustainable Develop- 
ment” of the United Nations Department of Economic and Social Affairs builds 
upon multi-agent action networks, consisting of resources, knowledge and experts 
in order to achieve their global sustainability goals. In their 2016 report, the top 
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three challenges listed as related to such networks are limited financial resources, 
followed secondly by changing mindsets and change management, and thirdly, by 
human resources, as depicted in figure 1.1 (Division for Sustainable Development. 
United Nations Department of Economic and Social Affairs, 2016, p. 13). 


Limited Financial Resources 32 
Changing Mindsets and Change Management e 18 
Human Resources a] 10 
Infrastructure (Transport, Internet) Sa 8 
Political Decision-Making Cycle = 5 
Weak Region - or Country Specific Knowledge m 2 
Lack of Leadership and Stakeholder Commitment = 2 
Communication = 2 
Coordination m 2 
Capacity Building hs 1 


Lack of Data and Information 8 1 


Figure 1.1 Top challenges of modern decision-making networks. Source Division for 
Sustainable Development. United Nations Department of Economic and Social Affairs, 2016, 
p. 13 


Future economies will inevitably face global problems, due to the ever- 
growing connectivity and service-oriented trade. Novel ideas and technological 
breakthroughs will outpace slow cultural development leading to increasing 
complexity. Global asymmetries in knowledge and information will further 
fuel change, making routine problem-solving unreliable and making its outco- 
mes volatile, thus endangering those who cannot maintain modern workspace 
requirements. 

CPS and non-routine decision-making experts need to be identified and pla- 
ced in an environment, where their actions are the most fruitful, such that others 
can imitate and learn from their success. This scenario could be enabled by a 
cheap and effective online assessment tool, as financial resources are limited 
by default. As expert knowledge is especially biased when addressing problems 
under uncertainty, this thesis focuses on two major goals: i) development of 
a non-routine problem-solving (NPS) assessment in the form of a highly effi- 
cient, online, web browser-based software tool; and ii) obtaining empirical results 
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related to the human individual and group decision-making (GDM), faced with 
uncertainty, change, different system states, and various forms of environmental 
public information. 

Data and according information on human decision-making behavior was 
acquired by a randomized experiment, which is considered to be the “gold 
standard” in scientific research (Rubin, 2008). As in any experimental design, 
participants were randomly assigned to different public information conditions, 
where circumstances were actively manipulated. The experiment was both run 
off- and online, however, the online experiment granted many advantages over 
its offline counterpart, mostly being more cost-efficient, and enabling the pos- 
sibility to model all participants’ perspectives via strategy- or logic-categories. 
Experiments are considered to increase innovation (Kohavi, Longbotham, Som- 
merfield, & Henne, 2009) and cost-efficient online assessments may support 
institutions and companies alike in finding experts, assigning them to their most 
skill-effective working domain, measuring and controlling the impact of infor- 
mation and ultimately supporting management in coping with complex problems 
successfully. 
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Theoretical Background 


Imagine being born and raised on the Hawaiian island Kaua’1, close to the shield 
volcano Wai’ale’ale. On this island, yearly rainfall reaches 15 meters and more 
(Kido, Ha, & Kinzie, 1993, p. 44). You were stuck in this small region on this 
remote island your entire life, without any information ever having reached you 
to indicate that this extreme amount of rainfall was extra ordinary. To you, heavy 
rain is the daily norm. Your day-to-day decision-making has been thus influenced 
by this routine and led you to form the belief that constant rain is entirely normal. 
Even small periods of “rain dropouts” will not change your belief that rain is the 
regular “status quo” of life. You develop some strategy to survive on the island 
making use of the rain, by building water mills, collecting rain water to drink and 
recover energy in warm baths. Your tribe members too develop survival strategies 
based on the stream of rain, however, while all of them do not question there being 
lots of annual rain, some have noted that the small periods of “rain dropouts” were 
influenced by godly external factors, which cannot be influenced and were entirely 
random. Others question this worldview, suggesting that they had observed some 
regularities in the occurrences of “rain dropouts”, which, to their understanding, 
could be used to maximize the water mills effectiveness. Some tribe members 
even assure you that “rain dropouts” could not only be anticipated, but were 
influenced by tribal sacrifices. 

While worldviews of each tribe member might differ, all of them are true 
experts when it comes to making use of rain. Despite this, all individual world- 
views of the tribe are wrong, since they lack global information about the true 
nature of rainfall. However, chance of collective survival was enhanced by actions 
and believes based on some mental model surrounding local experience with rain. 
So even though worldviews were at best a true representation of reality locally, 
in other words a homeomorphic mental model, these mental models did produce 
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6 2 Theoretical Background 


good performance measured in days of survival. These mental models had even 
proven to be effective in a group. Individual expert knowledge led to collective 
strategies following some focal point being “dealing with heavy rainfalls”. This 
focal point enabled the tribe to include heterogeneous decisions into a “direction” 
or path towards a common goal, even though each individual’s decision is also 
influenced by other tribe members’ decisions. It might also be that some indivi- 
dual decisions were bad decisions, based on a locally wrong mental model, but 
ultimately led to a good group outcome, vice versa. 

These outcomes might lead to falsely confirming a certain mental model. 
A tribe member who believes in having found some pattern regarding “rain 
dropouts”, might invest less working time on “water mills” shortly before he 
anticipates lack of rain, focusing more on bathing in warm rain water. This lack 
of work discipline might alter decision-making of those who belief that bathing 
enraged the gods, which, to their understanding, led to less rain. As they see more 
and more “pattern-belief members” bathing, the “sacrifice-belief group” begins to 
collect more rain water, working in nightshifts, as a sacrifice to soothe the gods. 
Assuming that a short “rain dropout” actually occurred, which was no surprise 
to the “chaos-belief-group”, who regard short dropouts to happen randomly all 
the time, the sacrifice-, the pattern-, and the chaos-belief groups are all locally 
confirmed in their believe. However, group performance was still upholding well, 
since one group gathered strength by relaxing, others collected more resources for 
drier times, while the rest maintained their working routine. The collective group 
performance equilibrium was proven to be stable. 

A change in environmental conditions, such as “rain dropouts” only impede 
performance when individual strategies are touched, meaning, as long as there is 
enough rainfall reliability, individual decisions will not change too much. With a 
growing duration of “rain dropouts”, chances are that individual decisions will 
adapt to these changes, even influencing group performance, “perturbing” the 
collective group decision network. These perturbations can themselves lead to 
a change in individual decision making, when tribe members’ decision output, 
such as production, depend on each other. Causally linked decisions might break 
or be formed anew, re-arranging the “rules” of the network. In any case, “change 
in rules” of this network, whether it stemmed from environmental changes, men- 
tal models, third-party decisions or group-dynamics has to be first identified by 
an individual decision maker, building a new mental model based on this novel 
knowledge, before a new strategy was applied based on this new knowledge to 
reach a certain goal. 


2 Theoretical Background 7 


From this small island economy “Gedankenexperiment” several important 
aspects can be derived that play a role in modern scientific approaches to 
decision-making. 

As mentioned at the beginning of the story, all tribe members (agents) only had 
access to local information: represented metaphorically by the small island, which 
can be regarded as a market, where decision-making takes place. Even though 
each individual had full access to necessary market information, decision-making 
differed and was not optimal, as the delay of rainfall was interpreted differently. 
Feedback was interpreted myopically to confirm the own belief. Agents do not 
even act optimally when provided perfect information and knowledge of the sys- 
tem structure, due to the “misperception of feedback”, which is part of day-to-day 
economic reality (Sterman, 1989). The tribe had three different theories regar- 
ding the “data” stemming from rainfall observations: the first group believed 
in being able to anticipate rainfall-dropouts by observing “patterns”, the second 
group thought it to be possible to control “rain-dropouts” and the third group saw 
“rain-dropouts” as an entirely random environmental condition, which cannot be 
controlled at all. The discrepancy between the tribe’s data and their theories lead 
to “errors”, which will influence outcomes of decision-making. From an empirical 
perspective, defining “error” is a complex task, which has a long history of deve- 
lopment, and marks a corner stone in economic statistics (Louçã, 2007). Error 
can occur by an improper choice of some model, lacking precision in measure- 
ment or can even stem from cultural chaos (Louçã, 2007). The nature of an error 
may also vary. They can be seen as being part of nature, being an unobservable 
disturbance or as unpredictable random behavior. Some mathematical descriptions 
define error as residual and observable, some see them as corrigible, and some 
not. Disturbances can get “their own life” and are more than nonconformity of 
some anticipated value and in any event, an unobserved “disturbance vector” and 
an observed “residual vector” should be distinguished (Louçã, 2007, p. 151). 

In other words, even a small and simple economy can develop complex and 
unpredictable self-organizing behavior. Durlauf (1998) defines “economic com- 
plexity” as a system where choices depend directly on the decisions of others. 
Such systems are evolving, and cannot by be fully understood or described by 
“steady states”, when there is limited information about the intentions and goals 
of third party agents (Durlauf, 1998). Such “steady states” are unchanging regula- 
rities or “atoms” of a system. The author further explains that “complex systems” 
inhibit nonlinear attributes because of the interdependence in decisions of its 
acting agents, and that a very important aspect of complex systems is its past 
history of events or its order of information by which its future outcomes are 
dependent on. This complex history can possibly result in “path dependence” 
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(Durlauf, 1998). “Path dependence” roughly describes “ugly habits” of a system, 
which are persistent and can lead to recessions. 

To understand a complex system’s behavior by applying models, several 
problems have to be coped with, being that simply looking at unchanging consis- 
tencies does not suffice, high volatility of predictions may arise from nonlinear 
dynamics, and “bad system behavior” can only be explained with large amounts 
of data. The human brain does not perform well at storing such large amounts 
of data, and are better suited in pattern recognition by visual inputs (Simoes & 
Hidalgo, 2011), as the human brain is in constant search of known patterns, acting 
as an “association machine” (Chlupsa, 2017). For this reason, clear visual repre- 
sentations are used in models, coping with “economic complexity”, such as the 
“Atlas Of Economic Complexity” (Hausmann et al., 2014). When no visual clues 
are provided to understand economic complexity, decision-makers might be over- 
whelmed by complexity, and even expert knowledge might not suffice. It was 
shown for example that antitrust analysis has become too complex for judges to 
evaluate accurately, when expertise knowledge is missing, and while basic econo- 
mic training helps in simple cases, this training failed to show significant positive 
influence in complex cases, leading to the conclusion that there exist antitrust 
cases, which are in fact too complex for generalist judges (Baye & Wright, 2011). 

Expert knowledge seems to be a necessity to successfully cope with problems 
concerning economic complexity. However, real world problems commonly are 
not well defined, can hardly be distinguished from their irrelevant environmen- 
tal conditions and modelling such fuzzy problems in a way that makes them 
solvable often proves to be the true challenge (Davidson & Sternberg, 2003). 
This also relates to problems stemming from economic complexity, as the indi- 
vidual goals and interpretations of others are unknown and constantly changing, 
while this information or lack thereof is ultimately able to influence the outcome 
of one’s decision. Just like economic complexity, individual agents or decision- 
makers can also be described as constantly evolving systems, called “cognitive 
systems”, which are constantly modeling their environment, focusing on “local 
aspects” representing barriers to the effective solution of a problem (Holland, 
Holyoak, Nisbett, Thagard, & Smoliar, 2008). 

It is then not a far-reaching assumption to define an economy in psychological 
terms. A market can be understood as a network of subjective instances serving 
as an input for strategies and volition in decision-making (Arthur, 1995). Agents 
or cognitive systems make choices based on their currently valid beliefs, which 
are subjective and often unknown to others. These beliefs are constantly tested by 
the system, which itself is built from all agents’ subjective beliefs (Arthur, 1995). 
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So, while the small island economy from the Gedankenexperiment does fulfill 
all mentioned attributes of “economic complexity”, which economic systems are 
considered as “complex” in reality? The complexity of an economic system also 
represents national production capabilities as non-tradable inputs (Hausmann & 
Hidalgo, 2010; Hidalgo & Hausmann, 2009), which influence the country’s pro- 
ductivity, where an increase of complexity in a country’s production structure is 
positively related to its capabilities (Zhu & Li, 2017). According to Felipe et al. 
(2012) Japan, Germany, the U.S.A, France and other wealthy countries are con- 
sidered countries with high complexity, while countries with relative low income 
per capita such as Cambodia, Papua New Guinea and Nigeria are considered to 
hold low complexity (Zhu & Li, 2017). 

While real life economies do not have to cope with changes in “rain frequen- 
cies” such as the small island economy, a country does have to cope with climate, 
technological, socio-economic and political change, also holding uncertain future 
scenarios; a “best-guess” what might happen, as performed by the three different 
belief-groups from the Gedankenexperiment, fails to be a good way to cope with 
such uncertainty, as in such decision-making domains, multiple possible paths 
lead to different future scenarios, whose occurrence probabilities are not associa- 
ted and probability ranking cannot be applied (Maier et al., 2016). It is then better 
to create some strategy, which performs well during multiple scenarios. 

However, the development of such a “stable” strategy isn’t easily construc- 
ted in complex economies, as belief alters decision-making. Whether or not a 
cognitive system considers some event being a random outcome or manmade, 
has an impact on the agent’s decision-making. When an event is considered ran- 
dom, agents stick with simple rules to optimize their strategy—when an event is 
thought of being manmade, agents try to figure out patterns to optimize (Schul, 
Mayo, Burnstein, & Yahalom, 2007). Agents might stick to their personal belief 
even though new information indicated that a deviation from their strategy might 
be beneficial, which is linked to several decision anomalies, such as the confir- 
mation bias, inertia bias, or weighting bias. It can also be linked to “routine”. 
The three belief-groups from our Gedankenexperiment stick to their own rou- 
tine, further strengthening their belief, possibly feeding their confirmation bias. It 
is known that strong routine enhances the preference of information that favors 
the routine, and makes information that contradicts one’s routine less favorable 
(Betsch, Haberstroh, Glöckner, Haar, & Fiedler, 2001). 

Altogether, the simple story about an island tribe, and respectful homage to the 
famous “Lucas islands model” by Nobel Prize winning economist Robert Lucas, 
Jr. (Lucas, 1972), shed light on many important aspects regarding decision-making 
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and problem-solving. These aspects are to be explained in greater detail with their 
latest insights from scientific experiments in the following chapters. 


2.1 Key Aspects for Real Economic Problem-Solving 


Many models attempt to describe, how humans engage in problem-solving. By 
modelling problem-solving, multiple questions arise: which instances of reality 
are seen by humans as problems and how can problems be categorized? How do 
humans define the boundaries of some problem and how can such boundaries be 
modelled? How can humans naturally engage in searching for solutions and which 
scientific insights describe such problem-solving attempts? In order to imple- 
ment problem-solving into domains of real, economic decision-making, several 
key aspects are to be explained in the following. Namely, two major categories 
describing problems in general, the definition and role of complexity regarding 
problem-solving, the definition and meaning of heuristics, and the definition and 
background of uncertainty. 


2.1.1 Well-Defined Problems 


In general, two types of categories describe problems that are to be solved: well- 
and ill-defined problems; this distinguishing generalization is effective, as all 
domains hold well- and ill-defined problems (Nye, Boyce, & Sottilare, 2016) and 
different cognitive areas are required for solving well- and ill-defined problems 
(Schraw, Dunkle, & Bendixen, 1995). 

Problems that can be broken down to a series of sub-problems, and also 
provide enough information about their goals, solution-path and obstacles, are 
considered well-defined problems; these problems can usually be solved using 
recursive algorithms (Davidson & Sternberg, 2003). 

The famous “Tower of Hanoi” problem is considered a “well-defined” problem 
(Davidson & Sternberg, 2003). It can either be solved perfectly using an iterative 
or recursive algorithm or by applying some strategy, consisting of several steps 
that will always solve the problem in the least number of steps. 

Multiple classifications exist in order to distinguish between well- and ill- 
defined problems, as well- and ill-defined problems exist in a continuum (Le, 
Loll, & Pinkwart, 2013). 
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2.1.2 Il-Defined Problems 


Contrary to well-defined problems, recursive algorithms cannot be applied to 
solve such problems, as the problem cannot be modelled as some set of steps 
necessary to solve them; they lack information about some clear path to the solu- 
tion or do not provide some statement about how the problem at hand can be 
solved (Davidson & Sternberg, 2003). 

From the perspective of a rookie facing some problem, this problem might 
seem to be “ill-defined” due to lack of experience. However, in such a case the 
problem is merely “undefined” and not “ill-defined” (Nye et al., 2016; Strunz, 
2019). A person who has never played the well-defined game of “Tower of Hanoi” 
before, will begin to develop some strategy and optimizing it further, until the 
most efficient strategy is found. At this point, “Tower of Hanoi” is regarded as a 
well-defined problem. This process is known as “learning”, and for this reason, 
applying the domain “learning” to successfully distinguish between well- and 
ill-defined problems is useful. 

When learning is applied to ill-defined problems, further categories are requi- 
red. Ill-defined problems are regarded as “complex problems” and the attempt to 
solve them is regarded as “complex problem solving” (CPS) (Dorner & Funke, 
2017). 

As described before, most problems in real life are “fuzzy” problems or lack 
relevant information that make them fall in the category of complex problems. 
Any complex problem is always an “ill-structured problem” (Griinig & Kühn, 
2013), which can be understood analogous to an ill-defined problem. Multiple 
domains are then necessary to consider when trying to define some theory of 
“problem-solving”, since an agent most likely faces some unknown, ill-defined 
or complex problem in economic reality: first, information might be interpreted 
differently by each agent, leading to heterogeneous problem perceptions. Second, 
rookies might lack some definitive “recipe” of action required to solve a problem. 
Third, even when some action is considered to be suitable, it is not yet clear, 
which intrinsic processes led to the decision-making itself. Fourth, if this process 
was successfully analyzed, it is unclear how an agent considered the action as 
positive or negative, as in “bringing the agent closer to the goal”. Last but not 
least, it is unclear how an agent would “find” a problem and “recognize” it as 
such; agents differ in their goal setting priorities and it is unclear why a certain 
path towards some goal is being chosen. As depicted in Figure 2.1 (Ohlsson, 
2012, p. 122) all these five domains would have to be combined in order to 
picture “problem-solving” fully, described as “heuristic search”. The cognitive 
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psychologists Newell and Simon stated that humans were able to solve unfami- 
liar problems by tentatively choosing different actions, mentally projecting their 
outcomes of the chosen action, followed by some evaluation, which is then used 
as a new input for their decision-making process, such that they are able to alter 
their approach to solve a problem; Newell and Simon referred to this as “heuristic 
search” (Ohlsson, 2012). 

Complex-Problem-Solving builds upon the understanding that ill-defined (ill- 
structured) problems lead to a lack of information, unattainable from the outset 
on first sight, where uncertainty follows up. Complex problems do not require 
complex solutions, however, a “bias bias” might lead to the underestimation 
of the performance of simplicity, which outperforms under conditions of high 
uncertainty (Brighton & Gigerenzer, 2015). 


Problem 
perception 


Problem 
finding/Goal 
setting 


Outcome 
evaluation/ 
judgment 


Decision 
making/Action 
selection 


Figure 2.1 “The structure of a hypothetical theory of problem solving.” Source Ohlsson, 
2012, p. 122 
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2.1.3 Definitions of Complexity 


“Complexity” in every-day language can describe problems that one regards as 
“difficult” to solve. In the economic domain task-difficulty and task-complexity 
are two different attributes: difficult problems are solved by incentivizing diverse 
problem-solving alternatives, while complexity is coped by institutions via selec- 
tion criteria adjustment, different rates of variation and adjusting connectedness 
(Page, 2008). To make predictions about the future, economic models rely on 
assumptions about reality expressed by mathematical functions originating from 
theoretical physics, informatics or sociology. 

Whenever complexity of some entity such as a market, country, global eco- 
nomy or project is to be measured, the modeler first has to define the “system” 
boundaries, its instances and their relations, which together equal the “system” 
itself that is separated from its environment. Before even defining “complexity” 
itself, it has to be noted that the modeler might run into the “frame problem” 
defining a system. By defining entities (states) and their relations, it makes sense 
to choose from a set of things that are meaningful to describe the system. For 
example, defining the system “engine” results in a meaningful list of cogs, metal 
rods and other things that when being changed in their structure or behavior, will 
also change the engine itself. However, by defining a list of things that are chan- 
ged, everything else is ignored and assumed to not change at all, regarded as 
the “commonsense law of inertia” (Kameramans & Schmits, 2004). While this 
assumption solves the “frame problem” for more common models, more sophi- 
sticated solutions have to be applied to actually solve the frame problem when 
cognitive agents are to be modeled, such as the “Thielscher’s Fluent Calculus”, 
which is used, for example, when robots are required to face “non-determinism 
and uncertainty” (Kameramans & Schmits, 2004, p. 45). 

In other words, when the modeler is interested in defining some “system” that 
is scanning its environment for change, in order to adapt its behavior to novel 
circumstances, just like a cognitive agent, its “states” or “entities” and their rela- 
tions are to be modelled as “fluent” states. Fluent states’ truth-values depend on 
the current context. Functions running on such fluent states are therefore adaptive. 
When a system is defined, its complexity can be measured. 

Complexity enjoys many definitions that vary amongst the scientific domain 
it is used in. “Complexity” was first mentioned in an 1948 article titled “Science 
and Complexity”, where it was stated that physical science was mostly inte- 
rested in two-variable problems, and that life science regards such simplicity 
as not significant (Efatmaneshnik & Ryan, 2016). Today, the term “complexi- 
ty” had been used in so many different variations and contexts that its meaning 
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became unclear (Efatmaneshnik & Ryan, 2016). Efatmaneshnik and Ryan (2016) 
differentiate between objective and subjective complexity in their generic frame- 
work. They define objective complexity as the size of the minimum descriptions 
necessary to describe a system. Objective complexity is not dependent on any 
observer’s perspective or viewpoint, but can be context and goal dependent. Sub- 
jective complexity on the other hand is dependent on the modeler’s choice of 
reference model. As depicted in Figure 2.2 (Efatmaneshnik & Ryan, 2016, p. 4) 
objective complexity is defined by context and by the modeler’s (observer’s) defi- 
nition of the system. So, while it is independent of the subjective viewpoint of 
some modeler, it still is dependent on the modeler’s subjective definition of the 
observed “system”. 
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A framework for measuring system complexity. Subjective complexity is relative to the observer, objective complexity is independent of the observer, 
All elements in the figure are context-dependent. 


Figure 2.2 A generic framework for measuring complexity. Source Efatmaneshnik & Ryan, 
2016, p. 4 


The definition of the “system” and whatever the modeler subjectively regards 
as simple, both determine some “reference simplicity”. Complexity is then the 
distance and size from this “reference simplicity”. This generic framework by 
Efatmaneshnik and Ryan (2016) can be used for a variety of complexity measures, 
such as Statistical Complexity, Complexity in Engineered Systems, Complexity 
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Measures for Graphs, Complexity of Repeating Patterns and can be used for evol- 
ving, dynamic models, which include learning agents. Distinguishing between 
objective and subjective complexity enables the modeler to include multiple per- 
spectives, whose reference simplicities naturally differ, leading to a “gap” between 
the agents’ views. Every reference point comes with an objective complexity con- 
stant and various subjective complexity measures, which are dependent on the 
agent’. 

As multiple cognitive agents will ultimately have different views on what defi- 
nes (subjective) simplicity, they will inevitably have different viewpoints on the 
measure of complexity. This is where “complexity economics” sees reason to 
include these derivations into the conclusion of contracts. Complexity Econo- 
mics states that multiple agents will disagree on the “reality” of a system after 
some written agreement or contract has been made. The agents then disagree on 
performance indicators, as indicated by figure 2.3 (Nota & Aiello, 2014, p. 88). 


Planned cost 


Cons 


Figure 2.3 Deviation distance of two perspectives on the individually perceived reality of 
some project over time. Source Nota & Aiello, 2014, p. 88 


This deviation of perspectives occurs when “system boundaries” are set by 
more than one modeler. For this reason “corporate decision-makers need to reflect 
the company as part of an open system” (Jeschke & Mahnke, 2016, p. 73), 
where system and its environment are defined by some meaningful boundary 
(“Sinngrenze”), which is open to a set of other meaningful entities coming from 
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heterogeneous perspectives, definitions and viewpoints, as long as some inter- 
nal selection rules are applied, where such entities can be approved or denied 
(Luhmann, 2012, p. 178). 

In the domain of corporate decisions, such selection rules should be defined 
neither too broadly nor too narrowly, such that critical information is included 
and managerial focus is preserved (Jeschke & Mahnke, 2016). As depictured in 
figure 2.4, such system boundaries can be modelled by two dimensions: the range 
of the considered system constituents and the time horizon of system analysis 
(Jeschke & Mahnke, 2016, p. 74). 


Figure 2.4 System 
boundaries defined by 
2-dimensional selection 
tule. Source Jeschke & 
Mahnke, 2016, p. 74 
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Based upon such a selection rule, the complexity of a system can be 
categorized, e.g. by multiple non-correlative dimensions such as multiplicity, 
interdependency, diversity, dynamics (Jeschke & Mahnke, 2016) and impon- 
derability (Jeschke, 2017). By analyzing the system’s complexity with this 
5-dimensional model, 32 distinguishable types of complexity describe different 
scenarios of decision-making complexity. For each type, different approaches for 
CPS or operations to reduce uncertainty are suggested by Jeschke (2017), such 
as clustering-analysis to reduce uncertainty from high multiplicity, cross-impact- 
analysis to cope with interdependency, specialization to counter high diversity, 
sound Business-Process-Management in order to stay above high dynamics, and 
risk-management to handle high imponderability. 

In the end, the reduction of uncertainty by heuristic processes can be assigned 
to all mentioned tasks in this sub-chapter. Heuristics are defined as conscious or 
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unconscious processes that efficiently ignore information (Gigerenzer & Gaiss- 
maier, 2011). To define some system or in order to being able to talk about a 
system, it has to be instantiated by a meaningful boundary or Sinngrenze, which 
is performed by ignoring information, i.e. relying on selection rules. In order to 
measure system complexity, after the system was defined, the measurement of 
complexity is not only affected by objective complexity but also by subjective 
complexity, which—again—includes ignoring information, stemming from sub- 
jective simplicity, e.g. relying on expert knowledge. To categorize complexity, 
models such as the “MIDDI’-model (Jeschke, 2017) can be applied to produce 
multiple types of decision-making scenarios, so that suitable problem-solving ope- 
rations can be used to reduce uncertainty in a context-specific manner, relying on 
approved and proficient methods; ultimately ignoring alternative approaches, and 
therefore information, in order to be capable of acting efficiently and effectively. 

The three tasks of defining a system, measuring the system’s complexity and 
categorizing its complexity all frame reality by ignoring information to balance 
the amount of relevant information and associated costs to manipulate this infor- 
mation. Heuristic decision-making is not applied in all decision-making scenarios 
mentioned in this sub-chapter, but is applied when a suitable model is developed 
(e.g. defining some system), when a model is adapted to context (e.g. measuring 
system complexity) and when models are linked (e.g. categorizing complexity), 
to frame limitless information in order to make cost-efficient or cost-effective pre- 
dictions. Therefore, to make capital favorable decisions, it is necessary for some 
agent to possess as much information as possible in order to frame it in a pro- 
ductive way. A game-theoretical analysis showed that it was favorable to possess 
information rather than to have access to it (Ravid, Roesler, & Szentes, 2019), 
as agents must be incentivized to gather costly information, overlook information 
when its price is in equilibrium and because cheap information does not necessa- 
rily approximate full information. Ravid, Roesler and Szentes (2019) strengthen 
the need for the design of information channels by which agents in a certain 
decision-making systems, such as a market, can learn, as knowing that certain 
information can be obtained is not the same as actually knowing this information 
(Ravid et al., 2019). 


2.1.4 Ignoring Information 
While there exists debate on whether the concept of heuristic search was falsi- 


fied, can be falsified at all by the Popperian manner or if it even was an empirical 
hypothesis (Ohlsson, 2012), the concept of heuristic search is still brought into 
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context with “planning” in more current studies (Baier, Bacchus, & Mcllraith, 
2007). Baier, Bacchus and Mcllraith use a simplified “relaxed planning graph” 
that ignores information on negative effects. In other words, they compute a sim- 
plified model to reduce complexity to build a new model that processes costs to 
achieve a certain goal (Baier et al., 2007, p. 614). 

Analogous to modern approaches to model planning-paths using heuristic 
search, the original idea of heuristic search was to also consider humans as infor- 
mation processing entities, who simplify reality by ignoring information due to 
their biological limitation (Simon & Newell, 1971). Just like mentioned algo- 
rithms, many papers from the 70’s considered humans to conduct “heuristic 
processing”, defined as an efficient problem-solving method, suitable for diffi- 
cult problems by ignoring certain solutions in the set of possible solutions. This 
restriction is based on certain evaluations of the problem structure (Payne, 1976). 

The most famous example on research regarding “heuristics” comes mani- 
fold from Kahneman and Tversky, who described three major heuristics, being 
“availability”, “representativeness” and “anchoring and adjustment” (Tversky & 
Kahneman, 1974), which were used in human decision-making under uncertainty; 
“under uncertainty” refers to any decision-making process with the absence of 
known probabilities regarding events of the state-space. Decisions can also be 
made “under risk”, where subjective or objective probabilities are provided. This 
basic differentiation dates back to 1921 and is still used to categorize decision- 
making scenarios (Knight, 1957). When a decision is made “under certainty”, the 
consequence of each possible action is known (Mousavi & Gigerenzer, 2014). 

In other words, human decision making was and still is theorized to be 
influenced by belief on the likelihood of events, where subjective or objective 
probabilities are not provided. Linked to this set of heuristics, a list of “biases” 
was given by Kahneman and Tversky, which represent deviations from the nor- 
mative rational theory, caused by error in memory retrieval or violations of basic 
laws of probability (Gilovich, Griffin, & Kahneman, 2002). 

Kahneman’s and Tversky’s heuristics-and-biases program had been challen- 
ged and criticized by the famous psychologist Gerd Gigerenzer (Gigerenzer, 
1996). Gigerenzer (2011) states that heuristics are neither rational nor irrational. 
While heuristics can outperform statistical decision-making in complex environ- 
ments, as rational models perform badly during uncertainty, caused by complexity 
(Mousavi & Gigerenzer, 2014), their accuracy depend on environmental cir- 
cumstances. People are able to learn to choose adaptively from a collection of 
heuristics; he further states that it was necessary to develop simple decision- 
making guidelines for complex environments and to connect the simple heuristics 
framework with other theoretical frameworks (Gigerenzer & Gaissmaier, 2011). 
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Decision-making under uncertainty does not necessarily benefit from logic 
and statistics according to Artinger, et al. (2015). Their research showed that 
decisions made in complex and uncertainty environments actually benefit from 
simple heuristics, as they are less sensitive to chaotic environmental disturbances, 
such as variance in data, thus generating less error (Artinger, Petersen, Gige- 
renzer, & Weibler, 2015). An intuitive example, where a much simpler heuristic 
decision-making rule outperformed a more complex model under uncertainty, is 
the “Simple hiatus rule vs. Pareto/NBD” model. Here, the complex model inhibits 
more information than the heuristic approach, but the heuristic approach resulted 
in better predictions (Samson & Gigerenzer, 2016). 

From these insights it can be derived that heuristic decision-making still plays 
an important role in modern approaches to cope with complexity. It not only 
seems to be natural for humans to use heuristics when making decisions under 
uncertainty—such an approach can also outperform statistical and logical models 
in anticipating development, when being computed by machines. Anyhow, uncer- 
tainty is an important factor to consider when predicting complex behavior. A case 
study had shown that failure to include stochastic effects derived from uncertainty 
in models analyzing traffic led to prediction biases up to 200% (Calvert, Taale, 
Snelder, & Hoogendoorn, 2018). Still, decision-making using heuristics is not a 
one-fits-all tool, outperforming statistical and logical computations in all circum- 
stances. It rather presents itself as a skill that can be learned to overcome bias and 
reduce uncertainty to make predictions that can outperform chance when being 
surrounded by complexity. 


2.1.5 Uncertainty 


Living beings, such as cognitive systems or decision-making agents, can be con- 
sidered as complex systems, where predicting their behavior might be of extreme 
challenge under uncertain or novel decision situations (Hernan et al., 2015). In 
day to day life the neural system reacts to different levels of uncertainty in a 
complex way, and subjective utility theory fails to correctly model human beha- 
vior. According to the reduction of uncertainty hypothesis, the human brain might 
be biased towards data which reduces uncertainty (Onnis, Christiansen, Chater, & 
Gómez, 2002). 

In a purely formal, mathematical context required for simulation, uncertainty 
enjoys crisp definitions and even its own “Uncertainty Theory”, which has become 
a branch of mathematics (Liu, 2018). This thesis relies on the explanation of 
uncertainty being 
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“any departure from the unachievable ideal of complete determinism” 


(Walker et al., 2003, p. 8). Risk and ambiguity are to be “limiting cases of a 
general system evaluating uncertainty”, where decision makers differ in prefe- 
rence/aversion of risk and ambiguity (Hsu, Bhatt, Adolphs, Tranel, & Camerer, 
2005). 

The overall meaning of uncertainty varies and depends on the scientific field 
and domain it is used in. However, uncertainty is part of organizational day- 
to-day reality (Schilke, Wiedenfels, Brettel, & Zucker, 2017). In enterprises for 
example, uncertainty in decision-making is being dealt by Information Systems, 
such as Expert Systems, Enterprise Resource Planning and Supply Chain Manage- 
ment (Irani, Sharif, Kamal, & Love, 2014). Project management is dominated by 
models, which assume or build upon determinism (Padalkar & Gopinath, 2016), 
while it is known that real-world problems mostly have access to incomplete or 
approximate information, limiting the uncertainty reducing capabilities of even 
an idealized algorithm (Traub, Wasilkowski, Wozniakowski, Bartholdi, & Ford, 
1985). With the rise of technological progress, partly stemming from quantum 
physics more than 60 years ago, it was already considered to be “unscientific” to 
assume infinite accuracy in any measurement, and that inevitable errors must be 
included in any theory, as they are considered being part of the sense-making of 
an environment, making strict determinism in scientific prediction an impossibi- 
lity (Brillouin, 1959). This perspective also translates to economic predictions, as 
uncertainty prevails even with lots of information provided (Walker et al., 2003). 
In meteorological science inevitable uncertainties in initial conditions and model 
equations led to a shift of predicting the most likely outcome to a distribution of 
probabilities, as well as to the understanding of the need to include and represent 
“doubt” in forecasts (Palmer, 2017). This new process of modelling predictions 
is also influenced by external third parties. Scientists need to withstand the pres- 
sure to predict in a more deterministic way than is justified by the given data, 
stemming from media attention (Palmer, 2017). 

The urge to avoid or work around the understanding of unavoidable uncer- 
tainty might stem from “intolerance of uncertainty”, which had been described 
as the “most fundamental, underlying variable of anxiety disorders” (Gosselin 
et al., 2008, p. 1428). “Uncertainty avoidance”, being intensely researched as a 
cultural factor to be considered by the works of Hofstede since the 70s, failed 
to show significance in a more current experiment, when being applied outside 
of the IBM study (Schmitz & Weber, 2014). On the contrary, studies still build 
upon the hypothesis that cultures express different levels of “uncertainty avoi- 
dance” (Hofstede, 2001) and succeeded in finding correlations, e.g. participation 
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in decision-making (Jang, Shen, Allen, & Zhang, 2018). Nevertheless, when tal- 
king about problem-solving, “uncertainty” has to be considered: a model linking 
uncertainty and cognition has shown that despite complete certainty over some 
final stage of a decision-making process, happening in a vast cognitive space 
representing complexity, uncertainty will not stop growing (Hadfi & Ito, 2013). 
To cope with the inevitable persistence of uncertainty in algorithms and heuri- 
stic problem-solving, it was suggested to translate “complex problem solving” to 
“finding ways of reducing uncertainty” (Osman, 2017). 


2.2 The Role of Information in Decision-Making 


In order to understand “information” it might be meaningful to ask “How much 
information do I acquire, when I learn something new?”. According to the 
“Kullback-Leibler divergence” the amount of information gained depends on what 
the agent had believed before (Baez & Pollard, 2016). If the agent assumed a fair 
coin-toss, or 50% chance of heads, it will gain one bit of information. When the 
agent expects a 25% chance to see heads, it will gain two bits of information 
when head actually appears. 

This example helps defining “information”. Just as “uncertainty” and “com- 
plexity”, the term “information” is used in every-day language and in scientific 
contexts in many ways. The following chapters will show different perspectives 
and definitions of information, how information can lead to uncertainty and to 
what extend information influences 21‘' century decision-making. 


2.2.1 Definitions of Information 


Mentioned coin-toss example builds upon the Shannon and Weaver model, where 
the information content is expressed in “bits”. The amount of information (I) is 
computed by I = log n, with n being the number of different output values. 
This model can be seen as translating the coin-tossing process into bits, a process 
which receives as input some belief about the future and translates it to some out- 
put, expressed in bits by the Shannon model. From this perspective, information 
reveals something about the input and its linked process. However, information is 
not the process itself, neither the input nor the output per se—the output expressed 
in bits merely is information about the input (belief) and the process (coin-toss 
and model) (Losee, 1998). However, the Shannon and Weaver model is limited to 
functional terms. 
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In physics information is commonly described as the entropy of a system. 
When nothing is known about a certain system, its entropy equals the loga- 
rithm of the number of possible states. Whether or not the observer of a system 
has to be included into the description of information and whether the observer 
can be seen in isolation is still debated in physics to this day (Brukner, 2018). 
While the problems and methods used in quantum physics might seem be too 
far-fetched and abstract for day-to-day economic decision-making, the intellec- 
tual basis for developing models used in problem-solving is identical in the two 
fields of study. “Bayesian-inference” is used in the thought experiment described 
by Brukner (2018), which is also common in game-theory and neuroscientific 
models about the human mind and brain. Knowledge or belief about a certain 
system and knowledge about the knowledge of others is part of game-theoretical 
analysis, as described in “The Dirty Faces and the Sage” (Fudenberg & Tirole, 
1991, p. 547). 

Using the “Hierarchical Model of Information Transmission” more abstract 
notions such as human perceptions, observation, belief, knowledge, as well as 
the influence of errors, misinformation and bad data can be considered (Losee, 
1998). Based on this model, a discipline independent definition of information 
was provided by the author Losee (1998), who defined information as some output 
coming from some process, where the output tells something about both the input 
and process from which it originated (Losee, 1998). 

This definition links the meaning of information to some process that might 
have an impact on the behavior of some agent being aware of the output of 
this process. An analogue definition describes information as “a stimulus which 
expands or amends the World View of the informed.” (Madden, 2004, p. 9), 
the stimulus being the impact following the perception of some signal, altering 
the agent’s “World View”. The introducing Gedankenexperiment about the tribe 
holding different belief-groups is also based on the latter definition. Whatever 
information is, it leads to constant updates about some agent’s world-view. When 
multiple agents are influencing each other’s decision-making, game-theoretical 
models come into play. 

In game-theory, information is considered “private information” when it is only 
obtainable by an individual agent, such as “a random thought or intrinsic moti- 
ves”. “Public information” refers to information, which is potentially obtainable 
by all agents, who are part of the “game” or decision-making frame. A typical ass- 
umption of game theory is that agents hold common knowledge about the given 
information structure of the game, and about the co-agents’ rationality. It is further 
assumed that agents do so by conducting complicated mathematical calculations, 
i.e. applying Bayes theorem without error when updating their beliefs (McKelvey 
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& Page, 1990). McKelvey and Page (1990) show that this game-theoretical ass- 
umption on human behavior is approximated by experienced subjects with 85% 
efficiency and by inexperienced subjects with 69% efficiency. The concept of 
the “Bayesian-Brain” is often considered by psychologists, neuroscientists and 
cognitivists. The model assumes that the human brain is constantly predicting 
possible events and deviations from what is expected, by performing Bayesian 
inferences, and in doing so, the brain is limited by the requirement to minimize 
costs stemming from error (Hutchinson & Barrett, 2019, p. 280). 

Hutchinson and Barrett (2019) hypothesize that mental events are not arising 
independently, but are always dependent on prior events. This hypothesis can be 
linked to the understanding of Durlauf (1998) that “history matters” for com- 
plex systems, such as cognitive agents. Opposing the more “simplistic model” of 
some cognitive agents receiving a “stimulus”, translating it by perceptive senses 
into some “response”, Hutchinson and Barrett (2019) offer a different model on 
both mind and brain, defining “information flow” from a novel psychological and 
neuroscientific view. 

As shown in figure 2.5 both mind and brain are in a constant fluent state. Each 
state consists of a non-linear, complex system of neuronal activities (green arrows) 
and feedback (purple arrow), which are to be separated in mind and brain activi- 
ties. In short, neurons activate memory from which certain “maps” of strategies 
are derived. Just like a scientific hypothesis, neurons try strategies in accordance 
to this map, choosing paths which deemed useful in the past and are then correc- 
ted by feedback. In a way, the brain simulates strategies by predicting the future 
based upon past experiences, hence “Bayesian Brain”. When the distance bet- 
ween the chosen path and the correcting feedback is too great, this distance can 
be considered an “error” and the neuron can correct this error by altering its path, 
i.e. correcting a prediction-error. When the chosen path equals the feedback, the 
predicting neuron already is on its correct path (prediction) and the hypothesis 
was correct. 

In a certain way, the brain constantly predicts the future and is constantly 
corrected by the environmental feedback and more importantly: the human brain 
is also corrected by anticipated prediction-error, and therefore not exclusively by 
environmental feedback. Each combined mind- and brain-state can be considered 
a “screenshot” of the agent’s “World-View”. A complex, non-linear network of 
trial- and error, constantly working on reducing uncertainty by choosing strategies 
that fits the current context. 
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Figure 2.5 Model of the human brain functionality as a fluent state. Source Hutchinson & 
Barrett, 2019, p. 283 


2.2.2 Derivation of a Definition for Information 


These examples show that “information” can be described as a fluent process, 
which itself can be described by “packages”, such as bits and providing an 
evaluation of the agent’s “belief and reality distance”. Physics, informatics and 
neurosciences can be combined in order to better understand information and its 
influence on human decision-making. In the end, a clear definition of informa- 
tion cannot be given; however, this thesis relies on the definitions of information 
by Losee (1998) and Madden (2004), integrating them into the novel predictive 
processing-framework by Hutchinson and Barrett (2019). Losee (1998) believes 
that information can be expressed by some value. While this value itself is not 
information, the value is informative about the input and process from which it 
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is derived. As an internal model or “World View” can be both altered by a sti- 
mulus and by the mere anticipation of a stimulus as shown by Hutchinson and 
Barrett (2019), information is not solely regarded as a stimulus as defined by Mad- 
den (2004). The predictive processing-framework (PPF) shows that each internal 
model is both input and process, so input and process cannot be clearly distinguis- 
hed in PPF, as required by Losee (1998) to make sense of the information value. 
In accordance to PPF, a state is linked to a new state by a fluent transition pro- 
cess consisting of frequent updating of prediction and prediction-error distance, 
while this “linkage” also serves as the process. In PPF each state is both input 
and process or best described as fluent states. 

Building upon the core statements of Losee (1998) that information can be 
expressed by some informative value, of Madden (2004) that information alters 
the internal model of some cognitive system and of Hutchinson and Berrett (2019) 
that cognitive agents both react to external stimuli and stimulated anticipation, the 
following is derived: 

If each of these complex fluent states of some observer were grasped in isola- 
tion at time t, and labelled by some integer, indicating its order of experience and 
an information theoretical function was applied to receive an informative value 
(e.g. based on log), then—in theory—a string of these fluent states would be 
identical to the entire experiences and all possible prediction results of the obser- 
ving agent at time t,. Information can then be regarded as a redundant function 
operating on itself, embedding an uncertainty value on possible outcomes, with 
this value being dependent on the agent’s experience (chain of information states) 
and its belief (prediction vs. prediction-error). 


2.2.3 Information Perturbing Events in Behavioral 
Experiments 


Fudenberg and Tirole (1991) close their work “Game Theory” with a remarka- 
ble insight. First, they explain that finite state space games are not outmatched 
by infinite-state-space models approximations, as the latter can have a very diffe- 
rent set of equilibria. Second, uncertainty about another’s information can lead to 
state spaces that are even unaccountably infinite. Therefore, in real life economic 
decision-making, where uncertainty is inevitable and can only be reduced to zero 
by accepting some “deception potential”, a game-theoretical model will either 
have to cope with uncountable infinity or potentially unprecise and therefore unre- 
liable approximations. Third, while in practical applications of game-theoretical 
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models finite state-spaces are used, their sensitivity to perturbations leading to 
entirely different outcomes 


“is another reason to think seriously about the robustness of one’s conclusions to the 
information structure of the game.” 


(Fudenberg & Tirole, 1991). 

In other words, human decision-making is hardly grasped and simulated by 
game-theoretical models, as “doubt”, mathematically expressed by perturbing 
some integer, can lead to entirely different outcomes. Even “heuristic approaches” 
are not immune to such perturbing events. Uncertainty or “doubt” stemming from 
deception or by how information is presented are important influencers for experi- 
ments in the field of behavioral economics and psychology. In the following, three 
major perturbing events will be briefly described: deception, the “frame effect” 
and the “order effect”. 

In short, while deception is commonly used in psychological experiments, 
deception is far less, if at all, accepted in the domain of economics (Krawczyk, 
2019). The “frame effect” describes how human decision-making is influenced 
by how different choice options are presented (Tversky & Kahneman, 1981), 
whereas the “order effect” analyzes belief updating (Trueblood & Busemeyer, 
2011). Deception, “frame effects” and “order effects” can have an influence on 
the maintenance and refutation of some agent’s belief, which is a critical process 
in sequential decision-making (Yoshida & Ishii, 2006). All three effects can be 
manipulated in order to experience different decision-making results or to “nud- 
ge” agents, e.g. using the “frame effect” to display information provided by a 
search engine’s result page in such a way that the agent’s choices can be impro- 
ved (Benkert & Netzer, 2018) or using the “order effect” to make agents perform 
riskier decisions (Aimone, Ball, & King-Casa, 2016). According to most eco- 
nomists, “deception” leads to noisy data and is considered unethical (Houser & 
McCabe, 2013), while no few psychologists saw deception as a way to produce 
useful results (Christensen, 1988). More recent research has shown that experi- 
mental economists’ aversion towards deception is justified (Ortmann & Hertwig, 
2005), however, to this day no clear definition of deception exists nor agreement 
on when deception appears to be used in some experiment (Krawczyk, 2019). 
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2.2.4 Making Decisions in a VUCA World 


As mentioned before, most real world problems are ill-defined. Human agents 
solve problems by ignoring information (heuristics), which works well to reduce 
complexity and to solve problems under uncertainty. In order to successfully apply 
heuristic decision-making or ignore information effectively, information has to 
be collected first. Information was characterized in this thesis i) as modelled by 
fluent states, ii) as being linked to an informative value building upon informa- 
tion theory, iii) as being observer-dependent, iv) as a redundant function to alter 
uncertainty. In the final chapters it was noted that models, experiments and the- 
refore decision-making outcomes are sensitive to information perturbing events 
caused by deception, the “order” or “framing” of information and that behavioral 
experiments disregard deception, as it leads to noisy data. All of these circum- 
stances surrounding real economic problems and the complex role of information 
lead to the conclusion that today’s world inhibits characteristics, rendering relia- 
ble long-term predictions challenging. This conclusion is expressed by the term 
“VUCA-world”. 

“VUCA” stands for “volatility, uncertainty, complexity and ambiguity” (Dör- 
ner & Funke, 2017, pp. 2-3) and is commonly used in economic contexts, refer- 
ring to the unpredictable nature of today’s economic decision-making domain. Its 
four features are similar to the attributes of complex systems, complexity, connec- 
tivity, dynamics and goal conflicts (Dörner & Funke, 2017). The term VUCA has 
been used in a variety of contexts such as to describe modern battlefield- (Nindl 
et al., 2018), work- (Seow, Pan, & Koh, 2019) and decision-making-environments 
(Giones, Brem, & Berger, 2019). The VUCA acronym has been misused i.e. pro- 
viding the impression that leadership was powerless to plan ahead and strategize 
(Bennett & Lemoine, 2014). On the contrary, the VUCA framework can help to 
strategize and plan ahead effectively, even when the decision-making environment 
is inhibiting features of a complex system. 

As shown in figure 2.6 (Green, Page, De’ath, Pei, & Lam, 2019, p. 2), two 
simple questions can be derived by the VUCA framework and consequently asked 
to categorize a complex system: “How well can you predict the results of your 
actions?” and “How much do you know about the situation?”. 

The contents of these two questions can be linked to “expert knowledge”. In 
their famous work “Human Problem Solving”, Simon & Newell (1971) found 
expert chess players to outperform novice chess players in recalling and repro- 
ducing the positions of chess pieces after 5 seconds viewing. Experts would 
remember and thus hold more knowledge about the chess game. Consequently, 
experts seem to outperform novices when it comes to the question “How much is 
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How much is known about the situation? 


Figure 1 Bennet & Lemoine[2] (left) & Pasmore, O'Shea & Homey [9] VUCA definitions 
(right) 


Figure 2.6 Dimensions of complex systems. Source Green, Page, De’ ath, Pei, & Lam, 2019, 
p. 2 


known about the situation?”. Perceptual-Cognitive research has shown that expert 
surfers were more likely to predict waves as being too risky than amateur surfers 
(Furley & Dörr, 2016). Experts might then outperform novices in complex pro- 
blem solving when answering “How well can results of actions be predicted?”. 
The overall question is then, how expert knowledge can be defined and whether or 
not expert knowledge helps in problem solving in a VUCA world. This question 
is to be answered in detail in the next chapter. 


2.3 Expert Knowledge and Problem-Solving 


According to Zeleny (2005) information is only symbolic acting, whereas know- 
ledge is true acting, which cannot be replaced by any amount of information. 
Information is seen by the author as a necessary ingredient, but insufficient recipe 
for effective volition (Zeleny, 2005). This is because codified knowledge became 
information, and information technology did not replace social interaction; it was 
necessary to transform information into effective action and not the other way 
around (Zeleny, 2005). The author further states that while there can be “too 
much information” there cannot be “too much knowledge”. These statements suit 
mentioned problems arising from information-based models with high degrees of 
complexity, ultimately producing uncertainty, instead of reducing it. Just like the 
model by Hutchinson and Barrett (2019) distinguished between mind and brain 
or modelling and acting, there exists an analogue distinction between information 
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and knowledge. Knowledge relies on operative acts of measurable volition, rather 
than on words or letters (Zeleny, 2005). 

Just as knowledge is not captured by information systems by these claims, 
there exists perspectives on expertise not being captured by knowledge manage- 
ment systems: expertise was mainly the result of tens of thousands of hours of 
acting (Trevelyan, 2014). According to Trevelyan (2014) expertise has to pass 
three tests in order to be considered as such: First, expertise has to lead to con- 
stant sub-par performance. Second, expertise has to lead to volition or concrete 
outcomes. And third, expertise has to be measurable. 

Therefore, action or volition seems to be the key factor combining “know- 
ledge” and “expertise”. The resulting term of “Expert knowledge” is now to be 
defined in more detail, followed by a short description on expert knowledge being 
used as a resource, and how it is linked to learning. 


2.3.1 Definition of Knowledge, Expertise and Expert 
Knowledge 


Theoretical philosophy, building upon ancient Greek philosophy, defines “know- 
ledge” as 


“justified true belief, or true opinion combined with reason” 


(Hilpinen, 1970, p. 109). This abstract approach in defining “knowledge” leads to 
logical discussions, whether the information I; of person A knowing some event 
pı, which includes some uncertainty c that this knowledge was wrong, and the 
information Iz of person A knowing that A himself knows that pı, was the same 
information (I; == I2) or not (lı <>I2), I2 also containing c. It also leads to 
“ad infinitum problems”, such as whether “A knowing A knowing A knowing ... 
knowing p”, containing c, or paradox problems that there cannot exist knowledge 
since uncertainty c is always part of some information (Hilpinen, 1970). 

An adequate definition of “knowledge” for the business environment was found 
to be more suitable, when being modelled less abstract than by attempts stem- 
ming from “epistemology”. The meaning of “knowledge” is ought to be found in 
the domain of cognitive sciences (Bolisani & Bratianu, 2018). By defining some 
discrete system, such as “the static object of knowledge”, the “frame problem” 
would again arise. There also exist studies claiming that knowledge did not find 
its boundaries from the works of one single agent, but was the result of an intel- 
lectual collective, such that knowledge is considered “cognitive contact”, where 
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assumptions about reality arise from acts of intellectual confrontation with others 
(Zagzebski, 2017). To provide a more business oriented definition and to over- 
come problems arising from the frame problem, when trying to model knowledge 
with discrete states, so called “fluid flows” are used, leading to the definition of 
knowledge as “stocks and flows” (Bolisani & Bratianu, 2018, p. 19). This defi- 
nition applies for both explicit and tacit knowledge and has to be combined with 
paradigms from physics regarding “entropic uncertainties” (Bolisani & Bratianu, 
2018). This leads to the three “rational, emotional, and spiritual fields” defining 
knowledge (Bolisani & Bratianu, 2018, p. 24). The rational domain of knowledge 
is defined as being objective and explicit, outlined by language and logic. The 
emotional dimension of knowledge is subjective and context dependent, being a 
result of our body responses to the external environment. The spiritual field of 
knowledge regards ethics and values, which are essential in corporate decision 
making (Bolisani & Bratianu, 2018). 

Decades ago, scientific research in human problem-solving found that expertise 
requires large amounts of knowledge; the expert has experienced many relevant 
patterns of some decision-frame and these patterns serve as a guide towards rele- 
vant parts of knowledge efficiently (Larkin, McDermott, Simon, & Simon, 1980): 
This knowledge storages contain varieties of patterns helping with the problem 
interpretation and problem-solving, while at the same time providing essential 
and relevant clues (Larkin et al., 1980). Intuition is described by Larkin et al. 
(1980) as largely being some ability to use “pattern-indexed schemata”, distinguis- 
hing novices from experts in problem-solving. This broad definition of expertise 
links to the more recent understanding of “expert performance” reflecting high- 
level, circumstantial adaptation skills, resulting from long periods of experience 
and volition (Ericsson & Charness, 1994). Patterns leading to expertise are then 
automatically acquired in a confined area, where acting happens. Above-average 
performance is then the result of this iterative process. Defining and selecting 
“experts” solely based on their years of experience, e.g. for Delphi panels, is 
a debated selection process, and collective performance in forecasting does not 
necessarily depend on there being more experienced experts in some decision- 
making panel. In “Delphi decision-making groups” the total amount of expertise 
necessary remains uncertain (Baker, Lovell, & Harris, 2006). 

Based upon the definitions of knowledge and expertise, expert knowledge is 
obtained by constant iterative acting in a certain confined domain, where the agent 
adapts to experienced patterns becoming more efficient in solving problems in the 
chosen domain, altering rational, emotional and spiritual mental models fluently, 
and doing so in constant exchange with other people. 
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Therefore, expert knowledge lives from acting. According to McBridge & 
Burgman (2012), expert knowledge is important for applied ecology and con- 
servation, as it inhibits complex dynamics, where action is required to reduce 
uncertainty. When empirical data is lacking, expert knowledge is commonly 
seen as the optimal source of information; expert knowledge is simply what 
agents know from practice, training, and experience, and manifests itself in effec- 
tive recognition of context-relevant information and efficient problem solving 
(McBride & Burgman, 2012). 


2.3.2 Expert Knowledge as a Resource 


Making predictions in complex and non-linear decision-environments can benefit 
from expert knowledge, but is no guarantee for precise forecasts. Age and work 
experience do not necessarily predict performance, and expert knowledge is con- 
text sensitive and has to be embedded in a suitable decision-making domain and 
framework (McBride & Burgman, 2012). Engineers for example debated many 
decades, whether or not system design was an intuitive art-form or a scienti- 
fic process, which had to be systemized; nowadays, engineers rely on a mixed 
bag of instruments and a more holistic viewpoint when it comes to design, 
including complexity management, workflows and cognitive systems (Kreimeyer, 
Lauer, Lindemann, & Heyman, 2006). While iterations of act results in learning, 
thus building expertise, such iterations have to be minimized in order to reduce 
costs, as described by the commonly used “Pahl and Beitz Systematic Approach” 
framework (Kannengiesser & Gero, 2017). Applications of lean and agile soft- 
ware development are growing (Tripp, Saltz, & Turk, 2018) and show that there 
exists an interest of embedding expert knowledge in more lightweight and flexible 
frameworks. This is done to reduce costs and in order to be able to react to unpre- 
dictable change efficiently (Saini, Arif, & Kulonda, 2017). In other words, in a 
complex environment, expert knowledge is handled as a resource to save capital, 
and to better handle uncertainties. This concept is used in “sustainable manage- 
ment” and referred to as “salutogenesis” (Miiller-Christ, 2014), which describes 
that capital can be used in order to stay capable of acting and reacting to unfo- 
reseeable events. So even though expert knowledge does not necessarily result in 
optimal results, it is still considered an important factor when facing dynamical 
decision-environments and can be effectively included in modern frameworks that 
save capital, leading to more sustainable problem-solving solutions. 
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2.3.3 The Role of Learning 


According to Simon and Newell (1971), human decision-making consists of 
cognitive and environmental characteristics (Campitelli & Gobet, 2010). This 
is called the expertise approach, combining the understanding of expertise and 
decision-making. Campitelli & Gobet (2010) suggest that Simon’s expertise 
approach should be included into decision-making research: experiments should 
test for level of expertise and apply different environmental circumstances. Expe- 
riments should contain participants with different level of expertise, in order to 
show whether or not experts and novices show different levels of bias, as predicted 
by Tversky and Kahneman (1981), and when and why such cognitive illusions dis- 
appear, as stated by Gigerenzer (1996). According to the “Simon and colleagues’ 
approach”, different environmental circumstances should be applied (Campitelli 
& Gobet, 2010), such that domain specific expertise can be compared to other 
domains, in order to test whether or not environmental circumstances have an 
impact on decision-making, whether this impact correlated to expertise, and if the 
type of heuristics applied by participants actually changed. Campitelli & Gobet 
(2010) also suggest that computational models that fit data of human behavior in 
a multitude of domains are more meaningful than models, which analyze human 
behavior in more specific cases. 

Theories in behavioral economics are seeking generality, adding parameters 
incrementally, such that results or models can be easily compared to even more 
general models; even though adding behavioral assumptions to some models des- 
cribing human behavior makes the model less tractable, behavioral models can 
outperform traditional ones in precision, when operating in domains of dynamics 
and strategic interaction (Camerer & Loewenstein, 2004). Behavioral economics 
relies on field experiments, computer simulation and brain scans, and Camerer & 
Loewenstein (2004) describe behavioral economists as methodological eclectics, 
who make use of psychological insights (Camerer & Loewenstein, 2004, p. 7), 
which distinguishes behavioral economics from experimental economics. “Beha- 
vioral Game Theory” generalizes the standard assumptions of game theory, using 
experimental evidence, and provides a model for “learning” in complex environ- 
ments, even including neuroscientific evidence to support models about economic 
behavior (Camerer & Loewenstein, 2004). 

The authors Reisch & Zhao (2017) describe behavioral economics as a theory, 
which does not rely on the view of the consumer acting as a rational Homo oeco- 
nomicus, but displaying “bounded rationality”, as described by Kahneman (2003) 
and Simon (1955), where their deviations are predictable “errors”. Behavioral 
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economics relies on the “information paradigm” in the sense that consumer beha- 
vior is incentivized by the information provided and by their learning progress 
in the form of preferences, biases and heuristic strategies; however, models buil- 
ding upon this understanding realized that even small incentives can have a big 
impact on decision-making (Reisch & Zhao, 2017). Key findings of behavioral 
economics include several biases and heuristics from prospect theory and men- 
tal account, and are used to design choice context; as consumers make decisions 
context-dependently, results by behavioral economic models can be used to nudge 
consumers (Reisch & Zhao, 2017). 

The influence of expert knowledge, the “expertise approach” of decision- 
making research and behavioral economics find common ground in the domain 
of “learning”. While “expertise” was defined as an “extreme adaption”, “lear- 
ning” too is linked to the concept of adaption, being defined as “ontogenetic 
adaption”, being observed change in behavior of an agent, which stems from 
making use of regularities surrounding the agent (De Houwer, Barnes-Holmes, & 
Moors, 2013). To acquire a clear understanding about behavioral changes, it is 
recommended to rely on this functional definition of “learning”, and to acquire 
information about when exactly learning occurs, so that insights of cognitive 
nature can also be derived (De Houwer et al., 2013). Experiments should then 
control when learning occurs to effectively measure behavioral changes, stepping 
away from inefficient models, which understand learning as a “mental mecha- 
nism” (De Houwer et al., 2013, p. 641). Experiments can be designed in such 
a way, as to include the “expertise approach”, behavioral economics and “ex- 
pert knowledge” by this understanding of “learning”: the three concepts would 
meet common ground in software-based experiments, where controlled contex- 
tual changes increased the probability in behavioral changes, which can then be 
compared to novice and expert problem-solving performance, having either per- 
formed only a few or many iterations of the experiment before, including decades 
of insights on how biases and heuristics influence decision-making. 

The next chapter will introduce the concept of learning, how it is related to 
measured behavioral changes, which are often influenced by biases and heuristics, 
and how individual agents can be understood as “disturbances”. 


2.4 Agents Acting as Disturbances 


According to Erev and Roth (2014) mainstream behavioral economics attempts 
to find deviations from the rational model, offering descriptive models. The aut- 
hors discuss human learning in order to find domains where people learn fast and 
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maximize their expected return, to better understand how the structure of an eco- 
nomic environment influences behavior (Erev & Roth, 2014). Important insights 
regard feedback and its influence on decisions. When feedback is limited to the 
chosen option—that is, when consequences of discarded options are not provided 
to the agent—the behavioral impact of negative outcomes last longer than the 
impact of good outcomes. This is because bad outcomes reduce the probability 
of the agent trying to reevaluate the option (Erev & Roth, 2014). This can lead 
to a certain “attitude” towards options through such exploration, where invalid 
negative prejudices are hardly overcome (Fazio, Eiser, & Shook, 2004). 

Exploration can be described as a requirement to obtain information during 
complex problem solving, since in such problem solving scenarios, information 
is hidden from the agents on the outset. As most real economic problems are com- 
plex or can be considered as problems under uncertainty, this chapter or in fact this 
thesis as a whole, will mainly consider problems under uncertainty. There exists a 
mathematical expression of the continuum from risk to uncertainty, coming from 
the “bias variance theory”, written as 


”total error = (bias)? + variance + €”, 
where “£” equals noise. The meaning of this continuum is very intuitively explai- 
ned by Gerd Gigerenzer in his introducing article “Taking Heuristics Seriously” to 
the whitepaper “The Behavioral Economics Guide 2016” (Samson & Gigerenzer, 
2016). 

As depicted in figure 2.7 the left person shows bias towards the bottom right, 
next to no variance and overall superior performance as opposed to the right 
person, who shows no bias, high variance and a lower score. This intuitive exam- 
ple shows that error can stem from either bias or variance. Fine-tuned complex 
models, according to Mousavi & Gigerenzer (2014), lead to high variance when 
being applied to different samples, while heuristics with fixed parameters have no 
variance, but bias. Still, problems under risk are different from problems under 
uncertainty, and while uncertainty is part of many day-to-day situations in real 
life, uncertainty has to be reduced to a form of risk, in order to make calculations 
dealing with uncertainty compatible to risk calculations. (Mousavi & Gigeren- 
zer, 2014). Anyhow, this thesis wants to assemble more theoretical background 
mainly about problems under uncertainty, while not ignoring important aspects of 
problems under risk. 
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Figure 2: A visual depiction of the two errors in prediction, bias and variance. The bull's eye 
represents the unknown true value to be predicted. Each dart represents a predicted value, 
based on different random samples of information. Bias is the distance between the bull's eye 
and the mean dart location; variance is the variability of the individual darts around their mean. 


Figure 2.7 Bias vs. Variance. Source Samson & Gigerenzer, 2016, p. VIII 


The following sub-chapter will capture the importance of feedback, and its 
potential influence on following decisions during complex problem-solving under 
uncertainty, where the agent has to explore, and possibly adapt to contextual 
changes. Following subchapters will specify the role of non-routine tasks, rou- 
tine strength in decision-making, derive non-routine problem solving, providing a 
short summary of these insights by referring to “complexity economics”. 


2.4.1 The Role of Feedback in Complex Problems Under 
Uncertainty 


According to Van der Kleij, Feskens, & Eggen, 2015 there does not exist a gene- 
rally accepted model on how learning is created by feedback, but there does exist 
some evidence regarding the positive relationship of feedback on learning during 
computerized experiments. However, Van der Kleij et al. (2015) also mention that 
these conclusions are not sufficient enough for explaining detailed relationships 
of feedback and learning, defining feedback as follows: “Winne and Butler (1994) 
suggested 
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“feedback is information with which a learner can confirm, add to, overwrite, tune, 
or restructure information in memory, whether that information is domain know- 
ledge, meta-cognitive knowledge, beliefs about self and tasks, or cognitive tactics 
and strategies” (p. 5740).” 


(Van der Kleij et al., 2015, pp. 2-3). The meta-analysis by Van der Kleij et al. 
(2015) considered 40 studies regarding the influence of item-based feedback on 
learning in a computer-based environment. “Item-based” feedback means that 
agents are granted immediate or delayed feedback on every item (Van der Kleij 
et al., 2015). Rich feedback led to more effective learning outcomes in “higher 
order learning” than “simple feedback”, which is defined as feedback only pro- 
viding information about the correctness of some response. Simple feedback is 
considered to be effective for „lower order learning outcomes“ (p. 8). “Lower 
order learning” is restricted to recalling, recognizing and understanding concepts 
with no need to actually apply this knowledge. “Higher learning” requires the 
application of knowledge in novel domains, which is referred to as “transfer” 
(Van der Kleij et al., 2015, p. 5). 

As people tend to think in short-sighted causal relations, commonly assume an 
effect to have a single cause and halt research for causes upon having found the 
first satisfying explanation, agents perceive only limited amounts of feedback to 
self-reinforce or self-correct strategies (Sterman, 2006). Time delays in feedback 
processes confound the agents’ ability to learn, resulting in decision makers to 
perform corrections, even when enough corrective actions have already been taken 
“to restore equilibrium” (Sterman, 2006, p. 508). 

According to Sterman (2006) “learning is a feedback process”, as depictured 
in figure 2.8, where both dynamics in a complex system and all learning depend 
on feedback. When deviations from expected states are perceived, agents per- 
form actions from which they think will close the gap. Therefore, strategies are 
influenced by misperceptions of feedback, unscientific reasoning and biases. In 
order to learn under conditions of high uncertainty, such as learning under cri- 
sis, this “ expected states gap” is closed by pre-training, using virtual reality, 
learning by imitation, communication, information systems, past experiences and 
operating standards (Moynihan, 2008). It is assumed that knowledge gathered 
before facing a complex problem under uncertainty helps to better perform in its 
problem-solving. While Moynihan (2008) stresses that ad-hoc learning during a 
problem under uncertainty is possible, novel routines should be explored before 
a network of agents is required to use them. 
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Note. The diagram shows the main impediments to leaming. Arrows indicate causation. 
FIGURE 2—Leaming is a feedback process. 


Figure 2.8 All learning is a feedback process. Source Sterman, 2006, p. 506 


In conclusion, all learning results from feedback, while learning outcomes are 
influenced by the quality of feedback. Simple learning outcomes already bene- 
fit from feedback solely indicating correctness of some response, while transfer 
requires more sophisticated feedback, i.e. 
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“hints, additional information, extra study material, and an explanation of the correct 
answer.” 


(Van der Kleij et al., 2015, p. 4). In situations of high uncertainty where such 
additional information cannot be provided, prior knowledge or exploration for 
new routines can be helpful. The latter real-world problem in crisis management 
is commonly referred to as “non-routine problem solving”. The following sub- 
chapter will introduce this concept in greater detail. 


2.4.2 Novel Problems, Real-World Problems, and Non-routine 
Tasks 


According to thorough experimental results stemming from the “bean fest para- 
digm”, where the relation of exploratory behavior and attitude formation was 
tested (Fazio et al., 2004), whether or not some novel decision alternative was 
considered good or bad—at least in a virtual world—is considered by agents in 
accordance to their weighting bias. Beans could be eaten or not, resulting in either 
positive or negative effects. Beans would differ in shape and pattern, and parti- 
cipants were able to defeat randomness by clustering the beans’ appearances, as 
shown in figure 2.9. 

The experiment attempted various conditions, such as providing feedback to all 
or only to the chosen bean, framing the experiment by granting points or subtrac- 
ting life points. In the end, the game was always a performance-based experiment. 
When a novel alternative in form of some bean is faced by an agent in this 
experimental environment, where a problem under uncertainty with item-based 
feedback is simulated, and the agents can learn from feedback (with feedback 
only provided to the chosen option), agents’ choice can partly be predicted by 
the common “negativity bias”. Participants who learned the positive and negative 
decision alternatives (beans) equally well, tended towards a negative response, 
generally showing negativity bias towards novel beans (Fazio, Pietri, Rocklage, 
& Shook, 2015). Agents are influenced by the looks and resemblance of patterns 
to prior experiences (Fazio et al., 2015). Whether or not an agent had a larger 
tendency to classify a novel bean as a bad bean, than can be expected by the 
agent’s learning pattern, defines the “valence weighting bias”. It is regarded “as a 
fundamental personality characteristic”, as 
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Figure 1 The population of bean stimuli forming the 10 x 10 matrix. Reprinted from 
Deutsch and Fazio (2008). 


v 


Figure 2 The bean matrix. X refers to shape, from circular (1) to oblong (10); Y refers to 
the number of speckles, from 1 to 10. The beans presented during the learning phase of 
the BeanFest procedure are noted with their corresponding positive (+) or negative (—) 
value. In any given study, the bean values are typically reversed for half the participants. 
This counterbalancing has not been found to influence outcomes. 


Figure 2.9 “Bean-Fest” causal structure. Source Fazio et al., 2015, p. 107 
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“Individuals’ valence weighting proclivities have proved relevant to sensitivity to inter- 

personal rejection, threat assessment, neophobia, decisions about risky alternatives, 
intentions to engage in novel risk behaviors, actual risk behavior, emotional reactivity 
to a failure experience, the expansion of friendship networks, and changes in depressive 
symptoms.” 


(Fazio et al., 2015, p. 117). Unfortunately, the authors Fazio et al. (2015) found 
the valence weighting bias to not be self-reportable by questionnaires. Also, their 
finding are limited to experiments, where decision alternatives give visual clues, 
so that the Bayesian brain finds fruitful potential to learning. However, the Bean- 
Fest experiment enables to simulate a decision-making environment, where each 
problem is novel and different and further shows that individual differences are 
key at the very core of problem solving. 

According to system theory, problems exist in real life—not only in science: 
reality reacts to problems by selection and problems are described as 


“real and effective catalysts of social life” 


(Luhmann, 2012, p. 173). Chapter 2 defined many aspects of real economic pro- 
blems so far. Most problems in reality are ill-defined, lack a clear instruction on 
how to solve them, happen under uncertainty, are solved by humans via heuri- 
stics, are complex, need to be solved by acquiring information or knowledge, are 
disturbed by error such as bias, will usually be solved by many interdependent 
decisions, require experience and learning to be solved and are embedded in an 
opaque network of cause-effect relations, whose feedback signals are not easily 
being interpreted correctly by humans. 

Studies on learning from feedback in real world problems or economic pro- 
blems in a complex environment are scarce. Keil et al. (2016) describe learning 
from performance feedback in complex environments, where outcomes are obser- 
ved with time-delay and where a multitude of actions are combined to generate 
outcome in different research and development stages of 98 large US pharmaceu- 
tical companies during 1993 to 2013 (Keil, Kostopoulos, Syrigos, & Meissner, 
2016). Here, the authors focus on a real world “order effect’ of information. 
Negative feedback, such as performance below aspirations, in an early develop- 
ment stage, are interpreted differently, leading to different actions than negative 
feedback in later development stages. In addition, Keil et al. (2016) distance them- 
selves from classical models of experiential learning regarding positive feedback. 
They argue that performance above expectations creates a buffer, whose size 
favors higher chances of organizational risk-taking. An increasing tolerance of 
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organizational risk-taking was described to favor search of novelties above aspi- 
rations, possibly leading to a shift of the company’s core project management 
(Keil et al., 2016). Organizational recession can also have a positive impact, as 
it conserves unexplored potential, nourishing firms during times of “uncontrolled 
exogenous adversity” (Levinthal & March, 1981, p. 309). 

Another core finding was that research should concentrate more on the rela- 
tionship between cognitive biases and behavioral learning (Keil et al., 2016) and 
that interpretation of information of performance feedback plays an important role 
in experiential learning in complex processes. Performance feedback is interpre- 
ted differently and following action also depends on the “order effect” or stage of 
R&D process (W.-R. Chen & Miller, 2007), also in accordance to prospect theory 
(W. Chen, 2008). 

From these examples it can be concluded that real-world problems and their 
related decision-making processes are indeed dependent on both interpretation 
and order of feedback information. For this very reason, the three information 
disturbing effects “frame effect”, “order effect” and deception were mentioned 
earlier. Purposeful deception, such as lies, too commonly disturbs feedback infor- 
mation in real world problems and is referred to as “real world deception” (Fuller, 
Biros, & Delen, 2011). 

Due to the high levels of uncertainty in complex environments, pre-training, 
exploration and routines are essential in coping with real world problems, espe- 
cially when time pressure does not incentivize investing in reflection time, novel 
routines or finding new alternative paths. Such a complex decision-making envi- 
ronment with high time pressure is represented by challenges in hospital settings. 
To reduce costs, the concept of “shared decision-making” and consumer education 
was tested back in the year 2000 by use of software. Here, treatments were not 
only chosen by the physician in terms of clinical considerations, but the treatment 
choice was also influenced by consideration of the patient’s values and preferences 
(Holmes-Rovner et al., 2000). However, as the study showed, the program faced 
many problems, which can be collectively explained by the effects of “informa- 
tion interpretation”, personal bias, and problems stemming from initial hurdles of 
novel routines: physicians restricted treatments to patients who wanted additio- 
nal information about the treatment. Physicians also decided not to participate in 
the randomized study due to personal enthusiasm for the program, and therefore 
tried to avoid inducing bias by participating. So, physicians did not implement 
the new shared decision-making process as a routine. 

All three problems restricted the implementation of a new routine or in other 
words, made this task of implementing the non-routine, shared decision-making 
program a real-world problem and a tough challenge. Individual characteristics 


42 2 Theoretical Background 


facing novelties, the uncertainties stemming from unknown causal relations by 
misinterpretation of information or order effects and lack of resources to pre-test 
some novel strategy, render handling non-routine tasks difficult. Despite the dif- 
ficulties when attempting non-routine tasks, they are considered as being part of 
important “21% Century Skills” in order to cope with a VUCA world, where cir- 
cumstances vary frequently, and its features are linked to performance in complex 
problem solving (Neubert, Mainert, Kretzschmar, & Greiff, 2015). 

In order to observe non-routine decision making and measure its related non- 
routine problem solving performance in an environment that does not incentivize 
reflection time, i.e. reflecting on a problem when time is cost-assigned, Strunz 
& Chlupsa (2019) developed a valid application-test scenario in form of a web- 
browser based online experiment. Its methods and findings are to be described in 
greater detail in section 2.4.4—in order to do so, problem solving and the role of 
routine are introduced in the following sub-chapter. 


2.4.3 Problem Solving Search and Routine Strength 


“In everyday speech the term problem solving refers to activities that are novel and 


effortful.”, 
while not all tasks 


“feel like problem solving. Some activities, like solving a Tower of Hanoi problem 
(...) feel like problem solving, whereas other more routine activities, such as using a 
familiar computer application (...) do not. (...) Newell (1980) argued that the dimension 
of difference between routine problem solving and real problem solving is the amount 
of search involved. (...) Newell claimed that we transit smoothly into problem-solving 
search and indeed that much of human cognition is a mixture of routine problem solving 
and problem solving that involves search. This claim is realized in his Soar model of 
cognition (Newell, 1990)” 


(Anderson, 1993). 

When researching problem solving, the Tower of Hanoi task was one of the 
first experimental tools being used (Anderson, 1993), and still is applied in rese- 
arch today. Tower of Hanoi has also been used in the psychology of problem 
solving (Hinz, Kostov, Kneißl, Sürer, & Danek, 2009), in neuroscientific research 
(Ruiz-Diaz, Hernandez-Gonzalez, Guevara, Amezcua, & Agmo, 2012), in order to 
test for executive function and planning (Donnarumma, Maisto, & Pezzulo, 2016), 
for working memory (Numminen, Lehto, & Ruoppila, 2001), and is being used 
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with children, adolescents, and adults from general and clinical samples (Robin- 
son & Brewer, 2016). Tower of Hanoi (ToH) consists of simple rules, which are to 
be explained in greater detail in chapter 3. For now, as can be seen in figure 2.9, 
all that should be noted about the game is that it always consists of some “state”, 
such as a starting configuration of 5 disks being put on the left most peg. The 
player than has to apply some “operator” to transform one state into a new state, 
by e.g. moving a disk onto another rod. In accordance to J.R. Anderson (1993) 
a “problem space” is then defined by both “state” and “operator”. When all pos- 
sible connections between states are modelled, by applying only valid operators, 
the entire state space represents the problem-space (Anderson, 1993). Whether 
humans hold a similar mental representation of this problem-space is still of inte- 
rest to recent research and results show that the total time required to solve a ToH 
problem is proportional to its complexity; complexity is defined as the problem- 
space distance between the game’s start and goal state, as well as the complexity 
of solution and its associated computational costs (Donnarumma et al., 2016). As 
Donnarumma et al. (2016) show, humans are having troubles to engage in counter- 
intuitive moves, which are considered as being more complex, as they require the 
agent to “look-ahead” when playing ToH. The authors also link “subgoaling” to 
the possible mental representation of a problem-space, where the problem is divi- 
ded into smaller portions, which have to be solved. The concept of “subgoals” is 
based upon scientific evidence that human behavior follows a hierarchical struc- 
ture, where basic and simple actions are clustered into subtasks, which themselves 
can be combined for the achievement of high-order goals (Solway et al., 2014). 
According to Donnarumma et al. (2016), the subgoal concept can explain sub- 
optimal decisions, during problems that require counterintuitive moves: humans 
have a tendency to simply draw a “direct path” from start to goal state by only 
being aware of the perceptual distance; the “subgoal” model forms an implicit 
metric from the problem space, and this implicit metric has a great impact on the 
decision-making outcome. Human problem solving or human search, is sensitive 
to its prior and often suboptimal mental, implicit representation. Implicit measu- 
res are considered as being useful for predicting behavior and analyzing change 
of mental problem representations (Blanton & Gawronski, 2019). 
Human problem solving is also sensitive to routines. Routine is defined as a 


“behavioral option that comes to mind as a solution”, 
which is not considered being some strategy but a 


“behavioral option that is most strongly associated with a specific decision situation” 
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(Betsch et al., 2001, p. 24). According to Betsch et al. (2001), prior-belief effects 
stemming from high routine participants resulted in agents being reluctant to over- 
come routine, despite novel feedback suggesting a change of routine as being 
a lucrative option. Participants who experienced high success rates acting upon 
a certain strategy, and who then showed high routine, were adapting at slower 
rates. However, instant adaption with strong routine induced participants were 
found, when novel feedback could be understood or correctly interpreted by prior 
knowledge. In their second experiment Betsch et al. (2001) had shown that strong 
routine participants were falling for the confirmation bias, when tasks were framed 
as being similar, but were able to discard old strategies, when a task was being 
explicitly described as being novel. All in all, routine strength significantly influ- 
ences decision-making, yielding confirmation biases in information acquisitions, 
and being sensitive towards how tasks are framed. Still, confirmatory tendencies 
can be overcome when a task is being described as being novel. Adaption in 
recurring decision-making is being slowed by strongly induced routine and high 
values in routine strength correlates with the underestimation or negligence of 
feedback, which encourages overcoming routine, i.e. change in routine strategy 
(Betsch et al., 2001). 

Extrinsic incentives, such as financial rewards are generally assumed to 
influence human decision-making performance. McDaniel & Rutstr6m (2001) 
compared two different theories regarding extrinsic reward, intrinsic reward and 
performance using a Tower of Hanoi experiment. While extrinsic reward can come 
in form of bonus pay, intrinsic reward was researched by observing monkeys sol- 
ving mechanical puzzles repeatedly. The animals did so without extrinsic reward, 
such as food. Therefore, it was understood that there exist actions, which are 
motivated intrinsically and are performed for their “own sake”, independent of 
extrinsic incentives (Eisenberger & Cameron, 1996, p. 1154). 

The first theory analyzed by McDaniel & Rutstrém (2001) is the psychological 
theory of “detrimental reward” effects. It was interpreted by the authors in two 
different ways: First, whether an increase in extrinsic reward lowered perception 
of attractiveness of the to-be-solved problem, leading to a reduction in intrinsic 
reward, followed by a decrease in effort, which led to worse performance overall. 
Second, whether an increase of extrinsic reward induced a distraction effect, lea- 
ding to a reduction of productivity. The second theory and third hypothesis were 
named “costly rationality” theory, and stated that an increase in extrinsic reward 
led to an increase in effort and performance. Extrinsic reward was implemented 
as error-costs, which differed in the low- and high-cost treatment. Therefore, an 
increase in error-costs or an increase in penalty was interpreted as a decrease in 
external reward. In short, participants reported longer time-use when the penalty 
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was increased. The authors interpreted the increase in time-use as high effort, and 
the increased penalty as a decrease in extrinsic reward, thus rejecting their first 
hypothesis (McDaniel & Rutstr6m, 2001). McDaniel & Rutstrém also found the 
penalty effect to have an insignificant effect on performance; they observed lots of 
individual variation in performance, potentially dominating any treatment effect, 
which they found to be in-line with research—however, whether individual varia- 
tion was the true cause for treatment insignificance is described as being unclear 
(McDaniel & Rutstrém, 2001). The executive function, defined as 


“a combination of working memory and inhibition inhibitory processes” 


(Zook, Davalos, DeLosh, & Davis, 2004, p. 286), had been found to predict 
heterogeneous performance in Tower of Hanoi experiments. 

Betsch et al. (2001) used a ““microworld simulation” to research the influence 
of routine strength. In order to measure complex problem solving, which includes 
non-routine problem solving, software-based methods either include mentioned 
microworlds or “minimal complex systems”. Different influencers on non-routine 
problem performance and their measurement procedures, as well as current scien- 
tific debate on their usefulness and how non-routine problem solving (NPS) can 
be measured, using a software-based “minimal complex system”, are explained 
in the following sub-chapters. 


2.4.4 NPS: Adaptation, Beliefs, Response Times and Emotion 


In order to research human decision-making in dynamic and complex domains 
complex, computer-simulated scenarios where proposed, which are to shed light 
on details of agents performing complex problem solving (CPS) under uncer- 
tainty (Funke, 2014). Realistic, computer-simulated problems, including multiple 
changing and interdependent variables, also referred to as microworlds (Funke, 
2014), require a certain order of actions to be performed, in order to efficiently 
and effectively solve them (Giiss, Fadil, & Strohschneider, 2012). Due to the 
complexity of such problems, the decision-making agent cannot possibly retrieve 
all causal relations, and therefore has to optimize its strategies through heuri- 
stics—here, cultural differences were found. Difference in problem-solving were 
explained by differences stemming from strategic expertise, which themselves are 
based on heterogeneous cultural learning environments (Funke, 2014). Signifi- 
cant differences in NPS performance by country origin, being India, US-America 
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and Germany, were confirmed, but whether this difference was related to learning 
environment characteristics remained unclear (Strunz, 2019). 

While recent research on cultural influences in CPS were less clear (Giiss, 
2011), and the influences of cultural uncertainty avoidance were conflicting 
at times (Güss et al., 2012), strategy making remains a strong predictor in 
performance under CPS. This leads to the understanding that complex and 
knowledge-rich problems not only require the use of heuristic decision rules, 
but further strengthens the importance of general and domain specific knowledge 
(Funke, 2014). Experts are found to spend more time exploring, showing higher 
adaptability and flexibility in their strategy making, which predicted performance 
(Giiss, Devore Edelstein, Badibanga, & Bartow, 2017). 

Minimal complex systems are less complex and their causal structure can be 
obtained by strategies helping with precise causal analyses. For example, the 
“Vary One Thing At a Time” (VOTAT) strategy can be applied to the mini- 
mal complex system “MicroDYN”, with its causal structure being displayed in 
figure 2.10, to successfully obtain full information on structure and behavior 
of the problem (Funke, 2014, p. 2). There seem to be two schools of thought, 
when deciding whether or not performance in complex problem solving can be 
equally measured with less complex simulations or “minimal complex systems”. 
How to clearly define and perform “Complex Problem Solving” (CPS) experi- 
ments still is heavily debated (Greiff, Stadler, Sonnleitner, Wolff, & Martin, 2015; 
Funke, Fischer, & Holt, 2017; Greiff, Stadler, Sonnleitner, Wolff, & Martin, 2017). 
Agreement on the question how to measure CPS performance exists in that parti- 
cipants have to overcome barriers that arise from opacity of relevant information 
and uncertainty about true causal relations governing the problem’s functionality 
(Strunz & Chlupsa, 2019). 

Two other important influencers on performance under CPS are environmental 
changes and learning of counterintuitive concepts. Both influencers have been 
mentioned before. Environmental conditions predict learning and maximization 
(Erev & Roth, 2014) can lead to confirmation bias and failure to adapt a strategy 
due to routine strengths (Betsch et al., 2001). According to evidence from CPS 
simulations, and as found in Strunz & Chlupsa (2019), environmental changes 
only change participants’ behavior when those changes actually meddle with an 
agent’s strategy performance (Cañas, Quesada, Antolí, & Fajardo, 2003). 

As explained before, performing counterintuitive actions is troublesome for 
humans to do (Donnarumma et al., 2016). Even when environmental conditi- 
ons have an impact on an agent’s strategy, overcoming its routine strategy might 
require counter-intuitive concepts or the realization that one is self-deceiving 
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exogenous variables endogenous variables 


FIGURE 1 | A typical MicroDYN item as an example for a more simple system with different 
kinds of effects. For the selected sets of endogenous and exogenous variables any cover story is 
possible (from Greiff et al., 2012, p. 192) 


Figure 2.10 Causal structure of Minimal Complex System ,,MicroDYN“. Source Greiff 
et al. 2012, p. 192 


himself with a mental model, which is by definition always an incorrect repre- 
sentation of reality (Sterman, 2002). Learning and knowledge are described 
as being essential in order to cope with a change in routine, as described in 
a study coping with supply chain management (Scholten, Sharkey Scott, & 
Fynes, 2019). Scholten, Sharkey Scott & Fynes (2019) describe various types 
of learning and knowledge processes that are to be implemented in order to 
adapt operating routines towards uncertainties stemming from supply chain dis- 
ruptions. One aspect found to be of significant importance is to reflect on 
positive outcomes, in order to use the full potential of knowledge creation 
(Scholten et al., 2019). As described before, positive performance feedback 
can result in taking more risks (Keil et al., 2016), meaning that an oversim- 
plification of some above average performance or a misinterpretation of its 
causal relation leading to the good performance, can result in too risky and 
costly actions by the decision-makers, who have not spent enough time reflec- 
ting on the feedback. However, as described in the former sub-chapter, implicit 
motives and bias that cannot be self-reported, such as valence weighting bias, 
deeply influence decision-making. Mathematics (Sidenvall, Jader, & Sumpter, 
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2015) and education science (Chong, Shahrill, Putri, & Zulkardi, 2018) are also 
more and more concerned with non-routine tasks and problems, with both fields 
coming to the conclusion that non-routine problem solving requires real world 
knowledge and is being influenced by individual beliefs: Whether a solution to a 
problem is simply imitated or constructed creatively depends on whether a stu- 
dent felt “secure” enough to do so, and less complex and wrong solutions were 
favored to the correct and more complex solution, when “it felt too complicated” 
(Sidenvall et al., 2015, p. 123). This not only applies to the behavior of students. 
Implicit motives influencing economic decision-making have been confirmed by 
neuronal evidence, however, this insight is still confronted with resistance in the 
field of business administration (Chlupsa, 2014). 

Beliefs and implicit processes can lead to bias in decision-situations, where 
the decision-maker is lacking information to make a decision based on former 
knowledge (Fazio et al., 2015). Following an inner “status-quo” or “inertia” bias, 
the decision-maker might prefer consistency over positive feedback (Alés-Ferrer, 
Hiigelschafer, & Li, 2016). In other words, the decision-maker might fail to over- 
come routine, despite feedback, while others overcome their bias and proceed 
with non-routine decisions, to effectively react to novel circumstances (Chlupsa 
& Strunz, 2019; Strunz, 2019; Strunz & Chlupsa, 2019). 

Thinking time as a resource, approximately measured as response time, can 
be helpful to overcome these biases. Response time is defined as the server- 
side time span between problem activation and client response (Rubinstein, 
2007). Research looking at response times in an economic decision-making 
context, stems from brain studies and neuroeconomics, where brain activity is 
monitored e.g. via resonance imaging (fMRI). Research regarding response time 
is also commonly used in psychology (Rubinstein, 2007). While there exists cri- 
ticism that most neuroeconomic studies resulted in “unimpressive economics” 
(Harrison, 2008, p. 41), some neuroscientific insights have guided behavioral eco- 
nomic research to this day. Cognitive processes coping with complexity, e.g., 
answering survey questions of different lengths, are linked to response times 
(Yan & Tourangeau, 2008), which are a well-researched indicator for overco- 
ming decision biases (Alés-Ferrer, Garagnani, & Hiigelschafer, 2016). Response 
times have predictive power when decision-makers are facing strategic uncertainty 
(Kiss, Rodriguez-Lara, & Rosa-Garcia, 2018), e.g. decision-makers show longer 
response times when multiple options are seen as equally attractive (Krajbich, 
Oud, & Fehr, 2014). In order to deduce meaningful information from response 
times, an agent’s action has to be identified either as a cognitive action, as an ins- 
tinctive action, or as a reasonless action. A reasonless action can be the results of 
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some mental decision-making process with low or no logical reasoning (Rubin- 
stein, 2007). Section 4.1.12 “Logic and Expected States” refers to this three-fold 
distinction later on. 

Performance in CPS stems from thinking time, but also from the agents’ ability 
to effectively “identify rules” governing a problem, gaining “rule knowledge” 
by understanding the problem’s internal causal relations (true rule knowledge) 
and “applying knowledge” by controlling the problem and achieving goals 
(Wiistenberg, Greiff, & Funke, 2012). 

Engaging in non-routine problem solving (NPS) is influenced by a multitude of 
factors. Very complex decision-making domains will favor heuristic search, while 
less complex domains will make it possible for the agent to engage in maximiza- 
tion (by algorithmic operators such as VOTAT), obtaining the true causal relations 
(true rule knowledge about structure and behavior of the domain). Both problem 
solutions can lead to positive feedback, from which routine can grow, and both 
solutions benefit from knowledge and learning. When environmental change leads 
to the routine becoming less favorable, individual valence weighting bias, power 
of routine, time pressure, beliefs and intrinsic metrics can either hinder or favor 
a change in strategy. In this case reflection time evidently is a good predictor in 
overcoming these mental hurdles. Less than 10% of mixed-country participants, 
about 10% of US-American participants, about 5% of Indian and slightly more 
than 20% of German participants (Strunz, 2019; Strunz & Chlupsa, 2019) were 
able to overcome mental hurdles in the NPS experiment “Flag Run”, engaging in a 
change of strategy, built upon a mental model “closer” to the true rules governing 
the complex problem or in other words: obtaining true rules. Rules do not change 
throughout the “Flag Run” experiment. However, the starting levels of the expe- 
riment “Flag Run” were constructed in such a way that agents would be nudged 
into building a routine, based upon a wrong mental model of the causal relations. 
Agents were nudged into thinking that they were able to control the direction of 
some playing piece, where in fact the direction of the playing piece was always 
set by default towards “left”. As can be seen in figure 2.11, the distance from 
the playing piece to the goal field is “two steps”, when counting from going left, 
jumping edges, or when counting right, going to the goal field using the more 
intuitive and visible path. Therefore, the left- and right-hand distance to the goal 
field are identical. The problem space of “Flag Run” is simple and the true causal 
relations are even simpler than in most “Minimal Complex Systems”. However, 
not a single agent has proven from its behavior to having understood the true cau- 
sal relations. The reason for this can only be speculated upon, however, Strunz & 
Chlupsa (2019) suspect that the implicit mental model of causally relating “direc- 
tion buttons” and “controlling directions” is very strongly embedded, leading to a 
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very high strength in routine. As the experiment was short, not enough time was 
given for most agents to find out all “hidden rules” governing the decision-making 
system’s structure and behavior. Strunz & Chlupsa (2019) also tested for a pos- 
sible correlation between overcoming routine and self-reported levels in “Joyous 
Exploration”, which is part of the multi-dimensional emotion “Curiosity”. Howe- 
ver, no relation between any of the 5 curiosity dimensions (Kashdan et al., 2018) 
and NPS performance was found. Participants who gained true rule knowledge 
did not report higher scores in “Joyous Exploration” and in fact, no correlation 
to any of the remaining 4 curiosity-dimension were found. The study did confirm 
that reflection time—that is thinking time measured as response time—did pay 
off. 
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Figure 2.11 Client-side view of “Flag Run” experiment. Source Strunz & Chlupsa, 2019, 
p. 116 


Participants consisted of Amazon Mechanical Turk freelancers (MTurks), who 
benefit financially from solving any task as fast as possible. Studies have shown 
that the main motivation of any MTurk was “compensation” (Lovett, Bajaba, 
Lovett, & Simmering, 2018), so that MTurks are suitable participants for expe- 
riments, where thinking time was associated with costs and is not incentivized 
(Strunz & Chlupsa, 2019). Noise from cultural differences in uncertainty avoi- 
dance, influenced by the cultural learning environment (Funke, 2014), all MTurks 
were expected, and differences in NPS performance by country-origin were 
indeed found (Strunz, 2019). 
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In all “Flag Run” experiments, agents who started investing in reflection time 
where more likely to find true rule knowledge (Chlupsa & Strunz, 2019; Strunz, 
2019; Strunz & Chlupsa, 2019). This was true for all country origins. Agents who 
obtained true rule knowledge solved the overall experiment with less operators or 
“actions” and in a shorter timeframe, therefore being more efficient, even though 
having invested more time. Agents who obtained true rule knowledge showed less 
meaningless or random operators. Strunz & Chlupsa (2019) assume that these 
agents outperformed in learning from uncertainty or learning from unexpected 
feedback: 


“While many researches and economists press the importance of skills that enhance 
adaption to changing conditions, it has to be understood that overcoming routine and 
its linked set of behavioral biases is not easily performed, and can probably only be 
done by a small fraction of leaders and employees, when there is not much time to 
reflect on the problem at hand” 


(p. 122). 

While “Flag Run” is less complex in its causal structure than any Microworld 
experiment, and its causal structure is even simpler than most Minimal Com- 
plex Systems, “Flag Run” still is very knowledge-rich. Its hidden rules, making 
it a CPS task, have to be explored by overcoming a mental model, stemming 
from strong a-priori routine, to simulate real economic problems, where deci- 
sions have to be made quickly and in a non-routine manner, to adapt to the 
ever-changing VUCA world. Agents had to use heuristics as in ignoring infor- 
mation learnt before and also had to adapt a strategy similar to an algorithmic 
procedure. “Flag Run” has learnt from the advantages of both worlds: the simpli- 
city of Minimal Complex Systems and the necessity of knowledge-rich structures 
of Microworlds. As a NPS task, “Flag Run” builds upon the understanding of 
“All models are wrong” (Sterman, 2002), and that experiments building upon 
this simple rule will probably further confirm the realization “Complexity from 
Simplicity”, once beautifully shown by John Horton Conway’s “Game of Life”. 
Nature’s true complexity is simulated in “Flag Run”, as even simple structures 
can result in complex problems either due to our resistance in recognizing “being 
in error’, human overconfidence or due to the circumstance of life that with 
unavoidable uncertainty comes immanent potential of self-deception. Being over- 
confident was shown to be influenced by testosterone (Dalton & Ghosal, 2018), 
which can result in socially beneficial values such as reduction of anxiety or provi- 
ding information. Being overconfident can also have negative consequences when 
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it is mostly the result of self-deception, not carrying any psychological benefits— 
the social benefit from overconfidence mainly depends on the environment and 
private information (Schwardmann & Van der Weele, 2017). 

This brings the current sub-chapter to the final conclusion that uncertainty 
can only be fully reduced by self-deception. An agent can either invest in some 
decision frame by communication, which is associated with costs, to reduce uncer- 
tainty with some risk-averse strategy. The agent can mentally nullify uncertainty 
by self-deception, risking potential follow-up costs, or in other words accep- 
ting “deception potential” by building upon some mental “truth”. As this thesis 
remains upon the understanding that uncertainty cannot be fully “eradicated” and 
that “all models are wrong”, deception potential is understood as being immanent. 
A full recap of section 2.4 will follow in section 2.4.5 


2.4.5 The Human Class: An Unbounded Set of Strategies 


In order to neither fall for the “bias bias” (Brighton & Gigerenzer, 2015), nor for 
unrealistic assumptions of agents always maximizing, the “middle ground” should 
not be ignored, as agents seem to be able to maximize under certain circumstan- 
ces (Erev & Roth, 2014), while still forming biased attitudes (Fazio et al., 2004, 
2015; Rocklage & Fazio, 2014) towards problems by exploration, reducing uncer- 
tainty. Problems under uncertainty and risk are to be separated, whereas risk and 
uncertainty can be linked in a continuum (Samson & Gigerenzer, 2016), control- 
ling both ends by learning from feedback (Van der Kleij et al., 2015). Feedback 
is easily misinterpreted, and all learning is a feedback process (Sterman, 2006). 
In complex environments learning from feedback is also influenced by framing 
or interpreting information, the order of information coming from feedback (Keil 
et al., 2016) and real world deception (Fuller et al., 2011). Individual characteri- 
stics, fear of uncertainty or lack of resources (Chong et al., 2018; Holmes-Rovner 
et al., 2000) render adaption to new conditions a challenge, due to routine strength 
(Betsch et al., 2001) and cognitive dissonance facing counter-intuitive problems 
(Donnarumma et al., 2016). In order to measure CPS which is linked to NPS 
(Neubert et al., 2015) it is important to realize that strategy change will only 
occur when change actually interferes with an agent’s strategy (Cafias, Quesada, 
Antoli, & Fajardo, 2003). Experiments should measure the critical success factor 
for NPS, being experiential learning (Scholten et al., 2019), by looking at when 
behavioral changes occurs (De Houwer et al., 2013). Thus, the experimenter can 
observe each performed action of all agents live, as shown in figure 2.12. 
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Praying Tine (current lever) 


Figure 2.12 Paticipant’s actions can be watched live via Curiosity IO backend. Source own 
source 


Reflection time was found to be an effective predictor for overcoming “wrong” 
mental models (Strunz & Chlupsa, 2019), while this thesis remains upon the 
understanding that all mental models are wrong (Sterman, 2002), and that uncer- 
tainty can only be nullified by self-deception, which comes along with advantages 
and disadvantages (Schwardmann & Van der Weele, 2017). Therefore, deception 
potential is regarded as being immanent. For this reason, complexity can grow 
from very simple problem spaces, with “Flag Run” combining all advantages 
from both Microworlds and Minimal Complex Systems, when trying to measure 
whether or not some agent is able to find hidden information, and is able to 
adapt its strategy based upon novel knowledge under circumstances, where time 
is considered a resource. 

Combining all mentioned insights agents are seen analogue to disturbances, 
which are able to inhibit special features leading to outcomes that are more than 
just a nonconformity to some anticipated value. Agents are regarded as an “un- 
bounded set of strategies”, producing perturbing deviations. As any model is 
wrong, no theory nor decision-making agent can ultimately nullify uncertainty 
(creating the bound of some set), and when it does, it can only do so by self- 
deception (defining some set with a bound), meaning that some theory predicting 
human behavior will always be wrong, given the right circumstances (redefining 
the set’s boundary). Defining some model as being either “descriptive”, “norma- 
tive” or “prescriptive” seems to avoid this problem at first hand, but whether this 
differentiation led to sustainable “normative” models, more efficient “descriptive” 
re-evaluations or more precise “prescriptions” is to be discussed in section 2.5. 
In mathematics, problems which arise from set theory, which are much alike 
the problems in trying to establish various “types” of decision-making catego- 
ries for models (normative, descriptive, prescriptive etc.), are elegantly solved by 
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introducing an “unbounded set”, better expressed as “class”: Getting rid of the 
more primitive “set theoretical” understanding, by basing mathematics on “ca- 
tegory theory”. In this understanding, a “human class” is always placed “at the 
first position of any model”, and is then followed by whatever reality is framed 
by this very agent, with its interpretation of risk and/or uncertainty produced 
by some expert system. Framing reality is highly individualistic, dependent on 
beliefs, stemming from intrinsic motives and the attempt to combine models can 
always lead to unexpected disturbances, which are the result of the network of 
interdependent beliefs. The financial market, as mentioned before, was interpre- 
ted by W. Brian Arthur (1995) as such a network of interdependent beliefs. All of 
the above is expressed in “Complexity Economics”. Complexity economics does 
not assume an economy necessarily to be in equilibrium. Agents change their acti- 
ons and strategies according to the outcome, which they collectively create. This 
will constantly favor change, to which they adapt their strategy anew. In a com- 
plex economy, agents’ strategies and beliefs are frequently tested, with the entire 
system being best described as a redundant, ever-changing function—analogue to 
the described definition of “information”. Therefore, complexity economics defi- 
nes an economy not as something physically existing, but rather as a network 
of contingent states, being embedded in indeterminacy, where outcome is based 
upon interdependent sense-making, with the entire system necessarily being open 
to change (Arthur, 1999). 

Section 2.5 will focus on how agents’ decision making is altered in a network 
where they can assume feedback being either random, machine- or human-made, 
can communicate with or deceive others, perform in problem-solving when com- 
munication is impossible. Putting the cart before the horse, section 2.5 will 
begin with a more precise definition of “model” and its linked categories from 
decision-making theory. 


2.5 A Network of Interdependent Beliefs 


Models in decision-making theory can be distinguished by three categories or 
types: normative, descriptive, and prescriptive. “The three-way distinction emer- 
ged clearly in the 1980s (Freeling, 1984; Baron, 1985; Bell et al., 1988)—all of 
whom wrote independently of each other), although various parts of it were impli- 
cit in the writing of Herbert Simon and many philosophers (such as J. S. Mill).” 
(Baron, 2012). 

Descriptive models are interested in why agents decide as they do, while nor- 
mative models try to describe how agents ideally behave, and prescriptive models 
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are concerned in prescribing enhancing feature for a certain decision-making 
process (Mandel, Navarrete, Dieckmann, & Nelson, 2019). 

According to Baron (2012), normative models do not have to be or even 
must not be justified by observations, as long as enough data was acquired by 
observation to clearly frame the normative model; less obvious normative models 
i.e. simple correspondence are justified due by philosophical or mathematical 
argument. 

Baron (2012) describes descriptive models as psychological theories, often 
explaining in cognitive ways how agents behave. These models include heuristics, 
strategies, and formal mathematical models. When observations depart from nor- 
mative models, useful descriptive models can explain these departures, referring 
to such deviations in behavior as “bias” when such departure is systematic. 

Prescriptive models are defined by Baron (2012) as engineering models, ori- 
ginally thought of including mathematical tools to analyze decisions or being 
educational interventions, such as teaching agents various heuristics to exclude 
certain decision-making strategies that can lead to bias during certain circumstan- 
ces. Prescriptive models include the idea of nudging people for them to perform 
normatively better choices. 

It is argued that this three-fold distinction is necessary, and none of the three 
model types should be combined, so that judgements and decisions can be impro- 
ved or at least preserved in their quality (Baron, 2012); in order to do so, it 
has to be understood what makes judgements “good”. Baron (2012) suggests the 
introduction of such distinguishing categories regarding quality, so that data can 
be collected on the “goodness” of certain judgements, monitored, and tested for 
improvement potential (Baron, 2012). 

By the concept of “Judgment and decision making”, models are to be defi- 
ned in order to improve judgements and decisions, have to be re-evaluated by 
the three-fold criteria of a model, define what “good” judgements are and what 
circumstances alter them in a more positive or negative way. In this chapter, 
judgements in an interdependent network of beliefs are to be considered. 

Section 2.5.1 introduces the theoretical approach to multiplayer decision- 
making and section 2.5.2 will focus on multiplayer experiments in behavioral 
economics. 


2.5.1 From Game Theory to Behavioral Game Theory 


Game theory has not only become a fundamental economic tool for theoretical, 
but also empirical science (Fudenberg & Levine, 2016). Game Theory is looking 
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at multiplayer decision-making scenarios, referred to as “games”, and is not only 
some abstract economical model. According to Fudenberg & Tirole (1991), game 
theory was for example applied in theoretical biology, considering animals as 
being agents, who follow a set of pure strategies. 

The most important aspect of game theory is that individual decisions and 
“games” are distinguished. While isolated agents are only concerned about uncer- 
tainties stemming from their surrounding environment, interdependent decisions 
by multiple agents being part of a common decision-making domain also have to 
consider uncertainties coming from their co-agents’ behavior, whose behavior can 
potentially influence the actions of all agents (Fudenberg & Tirole, 1991). Another 
key difference between individual decisions and games are “zero probabilities” or 
decision potential, which are irrelevant for decisions but are an intrinsic corner- 
stone for games (Fudenberg & Tirole, 1991). In order to make predictions about 
how a game will play out or change its path, ““Nash-Equilibria” are used. Nash 
equilibria describe a certain path or recipe on how a game will unfold, and if all 
agents figured out this Nash equilibrium to be reached, no agent had any reason 
not to behave as described by the prescribed recipe. According to this logic, only 
a Nash equilibrium can be predicted by agents, and can be assumed to be predic- 
ted by co-agents. Any prediction that comes to the conclusion that an equilibrium 
other than a Nash equilibrium is reached, the agent or another co-agent has to 
perform a “mistake” or “error” (Fudenberg & Tirole, 1991). 

Fudenberg & Tirole (1991) follow up by stating that “errors”, such as “mista- 
kes”, may likely occur, and in order to predict them requires the game theorists 
to know more about the outcome of the game than its participants. The authors 
state that “Nash equilibria” cannot be considered “good predictions” in all situa- 
tions, as not all information is contained in the game theoretical model, such as 
individual experiences of the participants, which can be influenced by culture. 

The authors state that in order to define a complete theory, “error” can be 
regarded as a human-made mistake “with small probability”. Error can also find 
its origin in “Payoff Uncertainty”. The latter renders both modeller and player 
being unable to be fully certain about any “payoff” value, as suggested by “Fu- 
denberg, Kreps and Levine” (Fudenberg & Tirole, 1991, p. 467). Allowing small 
payoff uncertainty can have large effects. According to Fudenberg, Kreps and 
Levine no economically interesting situation is lacking payoff uncertainty, and 
thought-experiments excluding payoff uncertainty may not be reasonable. This is 
referred to as the “uncertainty problem”. How the cause for “error” is defined by 
a certain model to be the most likely cause, defines the best model for this speci- 
fic set of data, however, even small causes have the power to shift an equilibrium 
(Fudenberg & Tirole, 1991). 
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With the introduction of constant uncertainty, common knowledge is defined, 
which not only includes payoff uncertainty, but also the initial uncertainty of each 
agent about the game’s structure (Fudenberg & Tirole, 1991). This formal defi- 
nition of knowledge leads to “technical and philosophical problems” (Fudenberg 
& Tirole, 1991, p. 547), some of which were already noted with reference to the 
“frame problem”. However, small changes (perturbations) in a game’s information 
structure (Common knowledge in an informal sense) has the power to change an 
agent’s knowledge and therefore alters common knowledge, rendering an exact 
description of common knowledge to be fuzzy (Fudenberg & Tirole, 1991). A 
fuzzy common knowledge solution is the “almost common knowledge” concept 
by Monderer and Samet (1989), which “requires that all players be “pretty sure” 
that their opponents are “pretty sure” about payoffs (...)” (Fudenberg & Tirole, 
1991, p. 564). 

The authors show how Nash equilibria are changed entirely by perturbations 
in their information structure and that the sensitivity 


“of even the Nash-equilibrium set to low-probability infinite-state perturbations is 
another reason to think seriously about the robustness of one’s conclusions to the 
information structure of the game.” 


(Fudenberg & Tirole, 1991, p. 570). 

In more modern approaches of game theory, payoff uncertainty is usually 
always part of testing models for stability. To enhance game theory, several 
behavioral models where formed to establish “Behavioral Game Theory”, such 
as the cognitive hierarchy (CH) model, used to predict initial conditions in a 
repeated game, the quantal response equilibrium (QRE), where agents may per- 
form small mistakes, maintaining correct belief about co-agents’ intentions, the 
Experience-Weighted Attraction Learning (EWA), which predict a decision path 
as a function operating on initial conditions, and various learning models, which 
include the understanding of the learning progress of co-agents, strategic teaching 
and reputation-building, leading to games outside of equilibrium (Camerer & Ho, 
2015). 

The term “behavioral” is being described by economical, psychological and 
decision sciences roughly as “being about mental processes” (Gavetti, 2012, 
p. 267). Modern behavioral economics still study “noise” in coordination games, 
where agents deviate from their routine because of mistakes, wrong perceptions, 
inertia or trial-and-error (Mas & Nax, 2016). Mas & Nax (2016) constructed a 
complex experiment, where agents played coordination games with multiple net- 
work partners, in games consisting of 20 subjects playing coordination games 150 
times: The subjects were neither informed about the game’s causal structure nor 
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about their co-agents’ types; agents were informed about their own payoff, the 
last round’s choice of their co-agents, but not about the payoff of their opponents. 
The experiment found 96% of decisions to be myopic best responses, and being 
highly sensitive to their costs. 

Costs and feedback create boundaries which have to be overcome in order to 
maximize by learning. A study by Bayer & Chan (2007) researched the famous 
“Dirty-Faces” game by a laboratory experiment, where iterative thinking (“He 
knows, that I know, that He knows...”) is required in order follow common ratio- 
nality. They authors arrived at the conclusion that a threshold exists between 
participants performing more than one and two or more meta-levels of iteration, 
due to the individuals being limited in their ability to apply such meta-cognitive 
thinking or because the agents considered higher order meta-level thinking to be 
useless, as their co-agents were expected to be unable to perform higher order 
meta-level thinking themselves (Bayer & Chan, 2007). 

The cognitive hierarchy (CH) model attempts to anticipate human behavior 
in one-shot games, building upon the number of meta-thinking levels a partici- 
pant performs (Camerer, Ho, & Chong, 2001). Agents who perform zero steps 
of thinking are considered behaving random, irrational or not strategically. With 
performing one level of iterated thinking, participants are considered to behave 
strategic. The CH model requires some estimate on how meta-level thinking is 
distributed amongst the participants. For this purpose, the efficient Poisson distri- 
bution is used, and participants’ heterogeneity is modelled into a thinking-steps 
model, which calculates the initial probability of individual choice. The model 
was fitted to data from three studies with a 2558 subject-games (Camerer & 
Ho, 2001). The thinking-steps model outperformed the quantal response equi- 
librium model, which assumed only one type per participant. The strength of the 
thinking-steps model was considered to be its modelling of a multitude of types 
per participant, i.e. agent heterogeneity. The behavioral game theory model was 
compared to the classic Nash equilibrium predictions, where the thinking-model 
predictions were closer to data than Nash equilibrium predictions. The equilibrium 
predictions by Nash equilibrium were mostly distributed amongst the limits, being 
either “O” or “1”. This is shown in figure 2.13 (Camerer & Ho, 2001). 

Still, game theoretical assumptions about common knowledge and rationality 
have not been shown to be followed by participants in interactive games in real life 
experiments, and common knowledge and rationality were disregarded as being 
a model for social interaction (Colman, 2003). Drew Fudenberg and David K. 
Levine (2016) suggested to enhance game theory with learning theory by simu- 
lation, using belief-based learning models, maintaining simplicity, by embedding 
complex learning theory into game theory and establishing breadth by combining 
static game theory and dynamic learning theory (Fudenberg & Levine, 2016). 
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Figure 8.2 Fit of thinking-steps model to three games (R? = 0.84) 


Equilibrium prediction 


Data 


Figure 8.3 Nash equilibrium predictions versus data in three games 


Figure 2.13 Behavioral game theory vs. game theory, experimental results. Source C. 
Camerer & Ho, 2001, p. 128 
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Today, there also exist claims that game theorists wrongly assume the uncer- 
tainty problem to be solved by agents accumulating information, even going as far 
as stating that the game-theoretical object of rationality cannot be described with 
persisting uncertainty, rendering game theory to be “irrelevant and useless”, while 
the true challenge was to explain the existence of heterogeneous transactions and 
social interactions by accepting ever-remaining uncertainty (Syll, 2018). In ano- 
ther very critical article, the economist Berhard Guerrien (2018) quotes Andrew 
Schotter, a former Morgenstern student, to show that game theory has more 
fruitful potential in the domain of “cooperative”, instead of “non-cooperative” 
game theory. According to this quote by Schotter, von Neumann and Morgens- 
tern were originally trying to break problems stemming from the infinite chain of 
meta-thinking by introducing strategically interdependent situations that are inde- 
pendent of their expectation of their co-agent (Guerrien, 2018). Guerrien (2018) 
further states that game theoretical constraints concerning which information is 
available to participants, were unrealistic and never verified by experiments. 

In total, game theory marks an important backbone for behavioral economics, 
from which many fruitful concepts, realizations and ideas were born. Learning, as 
a feedback process, benefits from knowledge. As stated before, knowledge cannot 
be formalized by any instant, game theory included, because such a process would 
render it instantly as information instead. A formalization of knowledge results 
in paradoxes, ad infinitum problems and logical debates, similar to the problems 
of old-fashioned “set theory” or as explained by the “frame problem”. Therefore, 
game theory is constrained in its possibilities as is any other way to model reality: 
it offers normative models for efficient computations, can be used as a platform 
for useful explanations in form of descriptive models or used as a “language” to 
build decision-making enhancing predictive models. Game theory has also shown 
the importance of sensitivity to perturbations in any normative model that builds 
upon the concept of an information structure. To the understanding of the author, 
game theory does not claim having solved the uncertainty problem, at least not 
exceptionally. Fudenberg and Tirole (1991), one of the golden standards of game 
theory, summing up the entirety of game theoretical insights until 1989 in one 
book, have claimed various times that models assuming perfect information or 
zero uncertainty are not even meaningful—even as a Gedankenexperiment. 

The next sub-chapter will concentrate on a few examples of group beha- 
vior phenomena observed under experimental conditions and described by game 
theoretical normative models. 
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2.5.2 Group Behavior 


As stated before, this thesis is interested in group behavior changes, when being 
confronted with different types of information. Specifically, this thesis’ experi- 
ment simulates group decision making under uncertainty, where communication 
between agents is not possible. In order to describe scientific research in such a 
domain, several phenomena of group decision making where communication is 
possible are also listed, in order to exclude such behavioral instances further on. 

Decision problems including more than one decision maker are studied in 
the domain of group decision making (GDM) (G. Li, Kou, & Peng, 2018). 
Studies on GDM are usually considering how much communication is allo- 
wed and how the final outcome is created by group decision making (Tindale 
& Winget, 2019). Several insights from GDM are listed by Tindale & Winget 
(2019): groups holding members of high expertise on the task at hand can improve 
overall group performance; individual motivation for the whole group to perform 
“accurate decisions” has a positive impact on group performance; groups can 
perform well without communication; communication will decrease group per- 
formance in situations where members are “less than wise”; shared group bias on 
the decision environment will “exacerbate” these biases (p. 28). When commu- 
nication between agents is possible, imitation and herding behavior are popular 
examples of group behavior. 

Imitation and innovation have been described as the “dual engines of cultu- 
ral learning” (Legare & Nielsen, 2015). It is known that humans imitate each 
other during social interactions, which positively influences action comprehen- 
sion such as improving language comprehension (Adank, Hagoort, & Bekkering, 
2010). Even emotions can be “imitated” in form of emotional contagion, which 
can improve perceptions of task performance (Barsade, 2002). “Imitate-the-best” 
and “imitate-the-majority” has been found to speed up individual learning under 
uncertainty (Garcia-Retamero, Takezawa, & Gigerenzer, 2009). 

In a software based experiment it was found that groups were able to find 
novel solutions to problems that would have been missed by individuals, since 
interpersonal imitation shifts the group towards the urge to find more promising 
solutions; however, the size of the group can have significant and nonlinear impact 
on the groups behavior and performance using imitation (Wisdom & Goldstone, 
2011). 

Next to imitation, the herding effect is especially relevant for analysis in crowd 
psychology, where irrationality can arise from group behavior, accumulating 
deception potential, leading to such phenomena like “exploding market bub- 
bles”, which is also referred to as “information cascades” (Samson & Gigerenzer, 
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2016, p. 109). Herding describes individual agents to imitate group behavior as 
a whole, rather than following own strategies (Hwang & Salmon, 2004). It has 
been shown by game theoretical analysis that time and frequency of public infor- 
mation can impact the collective learning process, and that public information 
can help a herd to overcome a wrong belief and inefficient paths (Bohren, 2014). 
Neuroscientific models suggest that social alignment is mediated by a system that 
monitors misalignment and rewards actions leading back to alignment (Shamay- 
Tsoory, Saporta, Marton-Alper, & Gvirts, 2019). For this purpose, information is 
required. In international markets, herding was found to depend on the level of 
information transparency (Choi & Skiba, 2015). 

When communication between agents is not possible, due to costs, security, 
technical problems or language barriers, coordination and cooperation without 
communication in some problem-space can be performed by “focal” real life deci- 
sion influencers or prominent solutions, referred to as “focal points” or “Schelling 
points” (Zuckerman, Kraus, & Rosenschein, 2011). They are defined as “a point 
of convergence of expectations or beliefs without communication” (Teng, 2018, 
p. 250). Such Schelling points were proposed as equilibrium refinements of the 
Nash equilibrium, where the ideal game theoretical strategy has to both consi- 
der actions of cooperation and coordination of potential conflict (Teng, 2018). 
Experiments found groups to outperform individuals in coordination games with 
focal points, when individual interests of the group were compatible and cogni- 
tive input was helpful for controlling the coordination problem (Sitzia & Zheng, 
2019); groups report worse levels of coordination when interests are not aligned. 
In coordination games, groups are also more sensitive to salience (Sitzia & Zheng, 
2019). 

Group planning behavior differs from individual planning behavior in deci- 
sion environments governed by either objective risk or subjective risk, the latter 
being referred to as “ambiguity” (Carbone, Georgalos, & Infante, 2019). In their 
study, Carbone, Georgalos & Infante (2019) focused on sequential group decision 
making behavior reacting to novel information. In the objective risk treatment, 
participants were informed about the statistical chances of their income, i.e. agents 
were informed about the amount of balls being hidden in urns, such that agents 
could manifest a realistic mental model of the experiment’s creation of risk. 
During the subjective risk treatment, no such information was provided to the 
participants. While individuals and groups were found to “substantially” deviate 
from the optimal, theoretical strategy, facing a stochastic and dynamic problem, 
individuals outperformed groups under objective risk, whereas groups outperfor- 
med individuals under subjective risk (Carbone et al., 2019). Furthermore, both 
group and individual were found to make myopic decisions under objective and 
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subjective risk (Carbone et al., 2019). When tested for planning, groups were clo- 
ser to rationality under ambiguity, creating more welfare (Carbone et al., 2019). 
The study comes to the conclusion that there exists a non-neutral attitude towards 
ambiguity (Carbone et al., 2019), which affects trust decisions. A negative attitude 
towards ambiguity correlated with a more negative attitude towards trusting opti- 
ons, while agents who considered themselves as trustworthy, were more likely to 
trust other agents (C. Li, Turmunkh, & Wakker, 2019). Therefore, subjective belief 
about others can have a crucial influence in group decision making. It is sugge- 
sted to not model subjective belief simply as subjective probability (Andersen, 
Fountain, Harrison, & Rutstr6m, 2014), as risk attitudes of individual agents have 
to be carefully considered first (Andersen et al., 2014). How to model subjective 
belief is described as an open question, and agents may not only hold a traditio- 
nal type of aversion towards risk, but also towards uncertainty, when decisions are 
made in a domain being governed by subjective instead of objective uncertainties 
(Andersen et al., 2014). 

A good example on individual behavior towards uncertainty comes from 
random group matching procedures. Multiplayer experiments which match par- 
ticipants randomly, can result in the participants feeling to be treated unfairly, 
when their partners behave suboptimal towards them. This was in fact experi- 
enced during pre-tests of the thesis’ experiment. Ballinger, Hudson, Karkoviata, 
& Wilcox (2011) claim that “working memory capacity” (WMC) mediates the 
ability of participants to react to such situations with more or less sovereignty, 
with WMC working as a mental buffer (Ballinger et al., 2011). WMC also 
supposedly predicts performance on how agents can adapt their “depth of rea- 
soning” throughout experiments with growing structural complexity (Ballinger 
et al., 2011; Strunz, 2019). 

Improved performance by groups compared to individual decision-making 
is commonly achieved by interpersonal communication (Charness, Cooper, & 
Grossman, 2015). When subjects work together via computer interfaces, com- 
munication costs can counterintuitively enhance group performance; while higher 
costs in communication reduces message quantity, they enhance message qua- 
lity in groups, so that groups facing communication costs outperform individuals 
significantly (Charness et al., 2015). However, in cases when communication 
likely introduces error, more communication is not always better than less. In 
such cases, assigning costs to communication enhances performance (Charness 
et al., 2015). 

Group decision making under uncertainty profits from communication, as sha- 
red information will increase decision quality, when information is sufficiently 
processed by the group; when shared information is insufficiently processed, 
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groups tend to be overconfident in their decision making (Sniezek, 1992). Social 
factors such as face-to-face discussions and the goal to reach consensus are 
described to influence group confidence (Sniezek, 1992). 

Experimental research about the influence of expertise and information in 
GDM under uncertainty in an environment, where no communication is possi- 
ble, is scarce. Such a domain could be thought of multiple agents working with a 
personal computer, making decisions by investing in a certain market, where each 
agent does not know its co-agents. Still, all of the agents’ decisions are interde- 
pendent and all agents will collectively see the same market results. Uncertainties 
might arise from different sources, such as uncertainty about the number of co- 
agents, causal relationship of group invest and market results or whether own 
action is of effective relevance. Uncertainties can stem from doubt, e.g. by asking 
the question whether there was an optimal group strategy, if such a strategy could 
actually be achieved and if maximization was possible with limited information 
about the causal relations. In general, two kinds of uncertainties in group decision 
making are considered: environmental and social uncertainty (Messick, Allison, 
& Samuelson, 1988). 

It has been shown that communication highly reduces environmental and 
social uncertainties “by enhancing group coordination and performance” (Messick 
et al., 1988, p. 678). Furthermore, it was experimentally shown that agents are 
risk-averse regarding environmental uncertainty, are less influenced by social 
uncertainty, while individual risk aversion was not influenced by communication 
at all (Messick et al., 1988). 

Individual experience, such as proficiency, also mediates how external informa- 
tion is interpreted and which measures are ultimately taken, as shown by disaster 
risk reduction decision-making: risk expressed by numbers or by verbal clues 
differed in their impact, while its impact also depended on whether or not an 
agent was a scientist or not, as scientists had more experience with risk expressed 
via numerical probabilities (Doyle, McClure, Paton, & Johnston, 2014). Howe- 
ver, verbal clues were consistently found to be regarded as more ambiguous than 
numerical terms (Doyle et al., 2014). In addition, probabilities were found to be 
commonly misinterpreted by the participants (Doyle et al., 2014). 

In summary, GDM under uncertainty without the ability to communicate with 
other agents will be influenced by individual expertise regarding performance, due 
to routine strength and their interpretation of information. The lack of communi- 
cation does not necessarily lead to worse group performance, which depends on 
the collective status of “wise” decision making, the decision-domain’s resistance 
to error-perturbation, and individual motivation to let the group make accurate 
decisions. Biases stemming from communication such as social influences, herd 
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behavior, imitation, collective bias and overconfidence can be excluded. Promi- 
nent solutions or Schelling points can be expected, when all agents have had 
similar “single-player” experience and expertise. Ideally, working memory capa- 
city is measured in such an experiment in order to understand individual stress 
resistance to “unfair” group constellations. The individual types of risk-aversion 
should not be influenced by the lack of communication. In the example of GDM 
under uncertainty, risk is seen as the individual attempt to try a strategy which 
deviates from a strategy that has been shown to be effective in the past. As parti- 
cipants in GDM have shown to be less risk averse towards social uncertainty, and 
are more risk averse to environmental uncertainty, information that is interpreted 
by an agent as there being good reason to belief that other agents are influencing 
the game, will more likely lead to deviation from the former “effective” strategy 
than with information that is interpreted by an agent as there being good reason 
to belief that random or uncontrollable instances are influencing the game. 
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General Research Objectives 


In order to formulate specific research objectives, this chapter condenses insights 
from the theoretical background, deriving key objectives that are to be analy- 
zed empirically. Following the “Engaged Scholarship Diamond Model”, designed 
to close the theory-practice gap (Ntounis & Parker, 2017), the four domains 
“Theory Building”, “Problem Formulation”, “Problem Solving” and “Research 
Design” are to be fulfilled in any preferred order by the researcher (Van de Ven, 
2007). This thesis’ conclusion serves as “Problem Solving” and will bridge real 
world problems (“Reality”) and empirical results from empirical research (“So- 
lution”). Heading from reality to theory, problem formulation included potential 
rise in complexity by globalization and the limitations of humans performing com- 
plex problem-solving. Information and expert knowledge were then identified by 
theory as being critical influencers in individual and group decision-making. The 
following first sub-chapter sums up key findings of the theoretical background. 
The resulting model, which is linked to some solution via the experimental rese- 
arch design, is to be explained in the second sub-chapter. A brief framework of a 
suitable experiment is provided in the third sub-chapter (Figure 3.1). 


3.1 Summary of Key Findings 


Besides limitations in financial resources, change and human resources were des- 
cribed as being the fundamental problems for interconnected institutions, engaged 
in complex problems of global proportions. In order to better cope with unpre- 
dictable change, expert knowledge is increasingly embedded in decision-making 
processes. Routine-strength can inhibit decision-makers to adapt to change effec- 
tively, while knowledge and feedback interpretation are influencing success in 
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Figure3.1 The Engaged Scholarship Diamond Model by Van de Ven (2007). Source Ntounis 
& Parker, 2017, p. 353 


overcoming routine. Expertise is formed by many iterations of acting in a certain 
domain, with heterogeneous feedback coming from this domain, and is there- 
fore a learning process, as all learning is a feedback process. The environment of 
such a domain is a predictor for maximization and learning itself, and can also 
lead to bias and to self-deception building upon illogical or even logical mental 
models. This can either make adaption to a novel, more efficient and effective 
strategy harder or easier. Experiments have shown that environmental conditi- 
ons only influence a change in an agent’s strategy, when feedback or the agent’s 
interpretation of feedback confirms that the new environmental conditions lead to 
a performance downswing, when the routine strategy is not altered. Environmen- 
tal conditions generally lead to different behavior when being formulated as being 
either man-made or its source being stochastic. Social or man-made change leads 
to agents trying to optimize via pattern recognition, whereas stochastic change 
leads to agents trying maximization via logical rationale. Risk, being expressed 
as either verbal or numerical probabilities are being interpreted differently, depen- 
ding on the agent’s knowledge—however, humans tend to not behave optimal 
when probabilities are provided. Groups and individuals behave differently facing 
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problems under uncertainty, also depending on whether or not groups are able 
to communicate within. Group performance is also influenced by its member’s 
expertise and performance, while individual expertise is hard to predict reliably 
via knowledge span, e.g. years of experience. 

Two major aspects are then to be researched: the impact of public information 
and expertise on group decision-making, when facing a problem under uncer- 
tainty. Public information will either be communicated actively via text messages, 
which are actively announced via pop-up notifications or passively via visual 
clues. In both cases, public information is therefore considered change. There 
will also be a case, where change is neither announced actively, nor passively, 
and agents will have to figure out change themselves via feedback interpretation. 
Change will either impede strategy performance or not. The dependent variables 
will not only focus on decision-making performance but also behavior, and the- 
refore strategy changes or accordingly strategy persistence. In no case will an 
agent be deceived by public information, a distinguishing aspect of the model for 
empirical research from psychological attempts including deception. 


3.2 Model for Empirical Research 


In order to test the influence of expert knowledge or expertise, the experiment 
has to be designed in such a way for participants being able to maximize in a 
domain, where feedback is part of a stable, well-defined problem. Participants can 
then use their optimal strategy in a second well-defined domain including little 
change, and then adopt their strategy in an ill-defined but metastable domain, 
with lots of change hidden from them, where the strategy from the well-defined 
domain still leads to maximization. During the well-defined stages, all agents act 
alone in isolation. In the ill-defined stages, agents will act as a group. The experi- 
ment will be based on the thoroughly researched puzzle game “Tower of Hanoi”. 
The multiplayer version of “Tower of Hanoi” is designed by a deterministic 64- 
state algorithm, which ensures that every agent of a group has influence over the 
outcome, but does not necessarily impact the outcome. The algorithm does not 
change during the ill-defined stages. Also, without communication, no participant 
can gain full control over the outcome. Therefore, even if the true rules governing 
the experiment during the multiplayer version are known, the outcome of some 
action remains unknown, making these stages ill-defined. However, a group can 
outperform randomness by sticking to the ideal strategy from the well-defined sta- 
ges. Finally, the metastable, ill-defined domain will inhibit little change at some 
point, which vastly changes the inner dynamics and feedback becomes “chaotic” 
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with high certainty. In theory however, all stages, including well-defined and ill- 
defined stages, can be solved in the same number of moves. Feedback itself will 
remain stable, i.e. logical from some strategic perspective during the well-defined 
domain. If the strategy is not altered in the well-defined domain after little change 
was introduced, performance will be worse, and feedback will remain logical from 
some strategic perspective. Feedback will remain seemingly logical from some 
strategic perspective during the metastable and ill-defined domain, but can also 
become chaotic from the perspective of some strategic perspective if some agents 
behaved “less than wise”. Feedback will be chaotic with high certainty during the 
instable and ill-defined stages. This might lead to participants interpreting chaotic 
feedback as being purely random, and any action being equally bad, leading to 
a state of mind as being “indifferent”. This can lead to agents acting blindly in 
accordance to their routine strategy or seemingly random. Feedback and there- 
fore interpretation itself is then used as the defining “atoms” of the system, in 
accordance to some system being described by its system-states as “instable”, 
“indifferent”, “stable” and “metastable” (Jeschke & Mahnke, 2013). Figure 3.2 
pictures these system states with intuitive diagrams. 


instabil indifferent stabil metastabil 


Figure 3.2 Considered system states: instable, indifferent, stable, metastable. Source 
Jeschke & Mahnke, 2013, p. 17 


Passive change is performed by visual clues, being the “goal rod” of the 
“Tower of Hanoi” stages. Agents will have to solve Tower of Hanoi three times 
in a row with the rightmost rod being the goal rod, and then three times in a 
row with the center rod being the goal rod. The change will not be “actively” 
communicated, but will be communicated “passively” by visual clues, which are 
corrected for color-blind people, i.e. not only by color but also by non-announced 
text. During the well-defined stages, expertise in solving Tower of Hanoi will be 
measured. The passive change is considered as non-social environmental change. 
Participants who are not immediately aware of the goal rod change will perform 
worse. After a certain number of well-defined stages, participants will face ill- 
defined stages, where the “Tower of Hanoi” game is in fact a multiplayer-version. 
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Different types of public information, also including no public information are 
tested in various information conditions. Again, the goal rod will be changed 
after the same amount of stages as in the well-defined stages. Throughout the 
entire ill-defined stages, the same hidden rules apply. This experimental setup is 
further specified in table 3.1. 


Table 3.1 Model for empirical research: system conditions of online experiment. Source 
own source 


Before (passive) change After (passive) change 


Well-defined Stable system, logical feedback. | Stable system, logical feedback. 
(single-player) 


Ill-defined Metastable system, seemingly | Instable system, seemingly chaotic 
(multiplayer) logical feedback. feedback, possibly leading to 
indifference. 


3.3 Experimental Framework for Research Objectives 


In order to measure the impact of public information and expertise, various infor- 
mation conditions and various forms of logic models have to be categorized. In 
other words, various information conditions and strategies have to be considered. 
Even well-defined problems of reality can usually be solved in more than one way. 
In order to enable two forms of logic being valid during the well-defined Tower of 
Hanoi game, the disks can “jump edges” just like in the “Flag Run” experiments. 
Therefore, even the well-defined stages can be solved in more than one way of 
thinking. Also, the direction cannot be influenced by the direction buttons during 
the ill-defined stages of the multiplayer version of Tower of Hanoi. As assumed 
by Strunz & Chlupsa (2019), direction buttons attract a deep intrinsic motive to 
be part of an agent’s strategy, being ideal for testing non-routine problem-solving 
performance. In addition, the disks also jump edges during the ill-defined stages, 
and are collectively controlled by all agents of one group. Three agents per group 
were chosen, however, the number of agents per group can be chosen arbitrarily 
in accordance with the algorithm. 

In summary, the experiment will research the following general research 
objectives: 1) the impact of active public information about change on group 
problem-solving behavior, when such change does not have an influence on stra- 
tegy performance, 11) the impact of passive public information about change on 
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group problem-solving behavior, when such change does have an influence on 
strategy performance, iii) the impact of various forms of active public informa- 
tion, e.g. social change or stochastic change, on agents changing their routine 
strategy, iv) the impact of active public information about hidden rules on agents 
changing their routine strategy, v) the influence of individual expertise stemming 
from well-defined learning environments in ill-defined problem-solving domains 
regarding overcoming routine strategy and overall performance. 

The experiment measuring these general research objectives is explained in the 
following chapter, after which the specific research questions and hypotheses are 
listed. 
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Empirical Research Design 4 


The experiment consisted of three parts: the login-stage, the experiment and an 
after-survey. The experiment starts with 6 single-player Tower of Hanoi games to 
enable learning and to induce routine, referred to as “individual decision-making 
expertise in routine-strategy”. The single-player rounds are followed by 6 three- 
player Tower of Hanoi games, where the first three multiplayer-games can be 
solved perfectly by the agents when sticking to the single player strategy. 

This chapter is divided into three parts. The first part describes the software 
used to measure behavior changes in a complex problem-solving game. As two 
different versions of the software existed, emphasis on software development 
process will be laid upon and an overview of the evolutionary process of the 
experiment is described. The second part describes the participants of the study— 
details about their background, and where they were recruited from are provided. 
The third part will explain what participants had to do during the experiment, how 
data was collected and in which order the experiment was structured. 


4.1 Development and Materials 


Two different software versions of the experiment exist. The first version was 
programmed with “zTree”, 


“a software package for developing and carrying out economic experiments.” 


Electronic supplementary material The online version of this chapter 
(https://doi.org/10.1007/978-3-658-33 139-9_4) contains supplementary material, which is 
available to authorized users. 
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(Fischbacher, 2007, p. 172). The software running on zTree was developed in 
cooperation with a local German IT company. The second version was deve- 
loped from scrap with same company and embedded in an also self-developed 
framework for behavioral experiments called “Curiosity IO”. 


4.1.1 Software Development Process 


Development processes for both software versions oriented themselves at a com- 
bination of the classic “Waterfall” software development process and “scrum”, 
using “Waterfall” as a framework, and embedding e.g. the face-to-face meetings 
suggested by scrum. 

The software development method scrum 


“assumes that the systems development process is an unpredictable, complicated 
process that can only be roughly described as an overall progression.” 


(Schwaber, 1997, p. 1). “Water-Scrum-Fall” is a hybrid approach, where “Hybrid 
Agile methods are a reality in most Agile implementations” (West, 2011, p. 9) and 
the “Water-Scrum-Fall” approach offers a “simple set of principles, working prac- 
tices, and roles for teams to execute (...) and guidance on team organization and 
transparency” (West, 2011, p. 11), while not excluding traditional development 
milestones. Here the “hybrid method in which traditional and agile approaches 
are combined seemingly provides the “win-win” situation.” (Theocharis, Kuhr- 
mann, Miinch, & Diebold, 2015, p. 13). The simple Figure 4.1 from West et al. 
(2011) precisely shows the software development process for both experiments. 
After an extensive meeting (“Water”), the IT company developed the software via 
scrum, with weekly meetings (“Scrum”), and offered support with bug-fixing, and 
performance testing (“Fall”) (West, 2011, p. 10). The development of Curiosity 
IO relied less on weekly meetings however and face-to-face meetings were no 
longer recorded in written form. 

All milestones of the software development process are being listed in chro- 
nological order in the appendix table “Software Development Milestones” (see 
annex 1). 
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Figure 6 Water-Scrum-Fall Is The Reality 
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Figure 4.1 Software development process Water-Scrum-Fall. Source West, 2011, p. 10 


4.1.2 Legacy Version of Experiment 


While an experiment using the zTree program had been conducted and data was 
collected, both program and data were not used. Instead, Curiosity IO was deve- 
loped, with an identical experiment, and its resulting data had been used for 
evaluation. The software development process of the zTree experiment, its appli- 
cation and the reasoning process for abolishing this program and its data will be 
described, before coming to Curiosity IO and its results. As this “failed” expe- 
riment led to new insights and improvement ideas concerning Curiosity IO, it 
is a fundamental cornerstone of the entire theoretical and practical process. As 
this legacy experiment was conducted in the historic “Hopfenpost” building in 
Munich, the experiment is referred to as the “Hopfenpost experiment”. The Hop- 
fenpost experiment will first be described, followed by critical analysis and its 
problems, ultimately leading to the decision to dismiss the experiment. Software 
documentation of the zTree software is attached in the appendix (see annex 1). 
The first experiment was conducted with the zTree software version from 
18" to 22" of June 2018 in a rented room located in the historic “Hopfenpost” 
Munich. This experiment was conducted “offline” with 264 participants, recrui- 
ted by two hired companies. The only requirement for all participants was being 
fluent in German. Before, a website was created, attracting potential college par- 
ticipants with prizes and participation fees. However, this approach deemed to be 
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ineffective to recruit participants. The two companies recruited 169 female, and 
95 male participants for the experiment, with no further restrictions on the par- 
ticipants, such as graduation degrees, age or monthly income. Due to technical 
difficulties 210 data sets remain useful for analysis. During the five-day timespan, 
experiments had been conducted from 10 am to 5:30 pm in four groups, in one 
room, using 18 rented laptops, a 28-port switch, and 3 backup laptops. Due to hot 
weather, participants had access to water throughout the experiment. Fees were 
handed out in cash to each participant after the experiment. 

Figure 4.2 shows the process of the Hopfenpost experiment, which was con- 
ducted via the zTree version of the Tower of Hanoi game. Players were first 
instructed to take a seat, and to self-report age and sex. Providing an email address 
was optional, and only necessary, if one was interested in winning a prize. Every 
experiment was assigned to either group | or group 2. Every group’s experiment 
consisted of 4 stages. Before the first stage, participants were orally informed 
about their task. They had to solve several rounds of “Tower of Hanoi”, iteratively 
answer a questionnaire, and were told about how to correctly use GUI elements 
in order to do so. They were not deceived by any wrong statement. In the first 
stage, both groups started with the regular one-player version of ToH, followed 
by a questionnaire, in order to self-report data on perceived stress (Cohen & Wil- 
liamson, 1988) and self-reported uncertainty (Clampitt & Williams, 2000). Minor 
changes were made to both questionnaires to adapt their content to the experi- 
ment’s context, e.g. replacing “months” by “rounds”. The questionnaires were 
also translated into German and participants were instructed to use the German 
versions. These two questionnaires were answered after every stage. The second 
stage started with three rounds of three-player version Tower of Hanoi. Three 
rounds of three-player version Tower of Hanoi, being referred to as “Tower of 
Europe”, is being played three times in a row as well during stages three and 
four. The global information between both groups differed in stage two. Glo- 
bal information was always provided orally, and in German. The first group was 
informed about the fact that they were now sharing control with two other people 
in this room. The second group did not receive any other information, other than 
there now being a “Please wait.” screen popping up after each input, as output 
calculation differs. In stage three, the first group was told that the directional but- 
tons did not and will not influence the game at all and in fact, had no influence at 
all during the past three games, and only change color when being pressed. The 
second group in stage three was now provided the same information as group 1 
during stage two—that they are sharing control with two others in their room. 
The final stage four offered no new information to group 1 but did provide group 
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2 with the insight about the “dummy-effect” of the directional buttons, just like 
group 1 during stage three. 

The experiment produced more than 400.csv files of data about the partici- 
pant’s choices made and self-reported items. However, the experiment was flawed 
for many reasons, which are now to be explained. 


= 


> = 


+ Regular 1- °3-player °3-player *3-player 
player version version version version 
e Questionnaire e Information e Information + No new 
multiplayer direction information 
e Questionnaire buttons e Questionnaire 


e Questionnaire 


=> 


= pm» 


+ Regular 1-player © 3-player version © 3-player version © 3-player version | 
version + No information e Information e Information: 
+ Questionnaire e Questionnaire multiplayer direction 
e Questionnaire buttons 


e Questionnaire 


Figure 4.2 Process model of the legacy “Hopfenpost’” experiment. Source own source 


4.1.3 Problems with Legacy Experiment 


Throughout the experiment, participants were able to communicate with each 
other. While the participants were instructed to remain silent, vocal signals such as 
sighing and moaning, eye contact and clicking sounds could not be avoided, and 
their potential influence on the data cannot be estimated. Another problem was 
the high effort in dealing with raw data. As the individual data sets are not linked 
to the individual seat ID written on each paper, where participants self-reported 
age and sex, but to the zTree ID, many references would have to be made by hand 
first. When zTree IDs were “shuffled” in order to randomize groups, it had to be 
written down by hand, who-is-who. In order to being able to truthfully report back 
to the companies who hired the participants and how many actually arrived, the 
self-reporting part about sex and age using paper was also done in order to know, 
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from which company the individual participant was recruited from, as both com- 
panies offered different wages. Therefore, the experiment was not fully automated 
in order to have solid evidence “in paper form”. 

Another problem was that zTree is not a very suitable software for GUI heavy 
experiments. While it is astonishing that a single player and multiplayer Tower of 
Hanoi game can be programmed using zTree’s rudimentary architecture, it is not 
graphically convincing. Furthermore, zTree does not offer redundant functions, so 
that the entire game was programmed with one huge iterative function. This led to 
the first backup notebook, which ran on an HDD drive, to calculate multiple inputs 
from more than 9 participants so slowly that participants started complaining, 
and with more than 10 participants, the entire experiment crashed. A notebook, 
running on an SSD, had to be used in order to run the experiment with enough 
efficiency. 

During the experiment the actions of each participants could not be monitored. 
Therefore, the participants had to be monitored by walking behind them, in order 
to being able to see their computer screens. This was sometimes necessary when 
participants reported problems with mouse-control, accidentally closed the appli- 
cation or other problems which laid outside of experimental relevance. However, 
when walking past participants back, some of them reported that being a huge 
issue and that they will probably behave differently, when being observed from 
behind. Participants who took a long time to finish the single player version game 
also reported orally that the presence of others who finished the game faster, made 
them feel uncomfortable, altering their cognitive stability to the worse. 

Due to zTree’s software restrictions, and security concerns, not all inputs could 
be saved, such as individual presses and clicks on the directional buttons. A group 
could have solved a stage with only seven actions, but with hundreds of inputs, 
the latter not being saved, distorting information. 

Since global information was reported orally, no recordings of each non- 
automated part of the experiment exist, and the participants were not isolated, the 
experiment is hard to defend against accusations of “use of deception”, modeler 
bias, and noise from communication between participants. In addition, technical 
problems and bugs resulted in biased data. When a game-group was only con- 
taining one or two participants instead of the required three, the software would 
show unusual behavior, which even crashed the entire experiment. While the lat- 
ter only happened once, it could have happened every time, when an experiment 
did not include a number of participants dividable by three. 

The questionnaires were slightly changed and translated into German. The- 
refore, it was not clear whether these questionnaires still validly measured 
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self-reported stress and uncertainty. The latter survey’s quality is debatable, as 
the study is a rather unknown proceedings manuscript. 

The “offline” participant acquisition and entire experimental process was very 
resource demanding. Due to this reason, and all mentioned problems above, an 
“online” version of “Tower of Europe”—that is the three-player version of Tower 
of Hanoi—was developed. While in the beginning the software “oTree” was con- 
sidered, which is an “online software for implementing interactive experiments in 
the laboratory” (D. L. Chen, Schonger, & Wickens, 2016, p. 88), which runs on 
any web-browser without any application requisites, again “oTree” was found to 
lack graphical requirements, and modern software development language. 


4.1.4 Curiosity |O—Structure and Functionality 


Software development of Curiosity IO started in November 2018, and was mainly 
finished August 2019. The software framework contains a number of classic game 
theoretical experiments such as “Prisoner’s Dilemma”, “Battle of the Sexes”, 
“Nash Bargaining Game”, “Optional Prisoner Dilemma”, “Public Goods”, “Trust 
Game”, “Ultimatum Game”, “Dictators Game”, “Public Goods (3 Player)”. It 
also contains “Flag Run” (Strunz & Chlupsa, 2019), “Dynamic Flag Run”, and 
“Tower of Hanoi”. Extensive bug-testing and prototype testing of the two main 
experiments, being “Flag Run” and “Tower of Hanoi”, had taken place. Since 
the 18 January 2019, multiple pre-tests and experiments were conducted. Using 
the “Flag Run” und “Dynamic Flag Run” game, 13 sessions with 1.459 parti- 
cipants from “Amazon Mechanical Turk” in total were performed, as well as 4 
“Tower of Hanoi” sessions with a total of 150 participants—prior to the main 
experiment. Raw data of all experiments remain saved, and a selection of raw 
data can instantly be exported in.csv file format. Most screenshots of the software 
are added to the electronic appendix, and is referred to as “e-appendix” in the 
following. 

Curiosity IO is a framework for online behavior experiments, which 
can be run on any device and web-browser. Participants can login to an 
experiment-session by entering the URL https://www.curiosity-data.com. Parti- 
cipants then have to type in the experiment’s “Session Code” (see e-appendix 
chapter3_1_participant_login). When an experiment-session is “open”, and the 
participant has entered the correct “Session Code”, the participant will automati- 
cally begin with the experiment. What comes next, e.g. a label for the participant 
to provide age and sex, a questionnaire or game theoretical experiment depends 
on how the experiment was designed by the experimenter. 
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Experimenters can reach the admin panel via https://admin.curiosity-data.com. 
It is password protected (see e-appendix chapter3_2_experimenter_login). After 
entering the correct password, the main menu of Curiosity IO is unlocked. It 
consists of three panels: “Sessions—Create and analyze test runs”, “Level Edi- 
tor—Create and edit level configurations” and “Survey Editor—Create and edit 
surveys” (see e-appendix chapter3_3_main-menu). 

Choosing “Sessions” leads to a list of all performed experiments (see e- 
appendix chapter3_4_gamesessions). The list displays the experiment’s name, 
type of experiment, date, number of users, and the session code. Pressing the 
red “X” icon on the top right will lead back to the main menu. Choosing the 
icon “New Session” will open up the “Create Session” screen (see e-appendix 
chapter3_5_session). 

The “Create Session” popup offers various options to design a session. Entire 
“experiments” are referred to as “sessions”. It can be closed with the red “X” 
icon, a “Cancel” button, and a session can be created pressing a “Save” button. 
In the first two empty text fields, a “Session Name” and a “Player Password”, 
formerly referred to as “Session Code” is to be entered. A checkbox enables 
the experimenter to activate or disable the panel, which enforces the participant 
to self-report their sex and age, before being able to start with a session. Each 
session can consist of three parts: a pre-survey, an experiment and an after-survey. 
Pre- and after-survey are questionnaires, which can be designed in the “Survey 
Editor”. Experiments include all listed game-theoretical experiments, as well as 
“Flag Run’, “Dynamic Flag Run” and “Tower of Hanoi’. All these experiments 
can be chosen from the dropdown menu “Select Game”. A session can also consist 
only of surveys—in this case “No Game” has to be chosen. 

Depending on the chosen experiment, various options are now enabled, disab- 
led or added. The two dropdown menus “Select Pre Survey” and “Select Post 
Survey” are always enabled, and list all pre-defined questionnaires created via 
the “Survey Editor”. When the experiments “Flag Run” or “Dynamic Flag 
Run” are selected, the dropdown menu “Select Level Configuration” is enab- 
led, and lists all pre-defined level configurations created via the “Level Editor”. 
When the experiment “Tower of Hanoi” is chosen, multiple elements are added: 
a button “Add TOH Configuration”, three empty text-fields named “Single- 
Player Timer”, “Multi-Player Timer”, “Give-Up Timer” and a checkbox labelled 
“Bot” (see e-appendix chapter3_11_createsession_2). Selecting “Add TOH Con- 
figuration” opens makes the “TOH CONFIG” popup appear (see e-appendix 
chapter3_12_tohconfig_1). 

The “TOH CONFIG” popup can be closed with the red “X” icon, a “Cancel” 
button, and a “Tower of Hanoi” procedure can be created pressing the “Save” 


4.1 Development and Materials 81 


button. Such a procedure has to consist of at least one group. Each group can 
experience a different procedure. Participants will be put into groups (experiment- 
group) automatically, without being explicitly informed, by the group number. 
This will explained later in greater detail. “Add Group” adds a new group, which 
can be deleted by pressing the “Delete” icon or edited by clicking on its entry 
label (see e-appendix chapter3_13_tohconfig_2). When editing a group, consis- 
ting of at least one “game”, a list of its games are displayed (see e-appendix 
chapter3_14_tohconfig_3). The group “edit” menu can be closed with the red 
“X” icon, a “Cancel” button, and a group procedure can be created pressing the 
“Save” button. One game is always filled in by default. The edit list contains 
information about the game ID, number of discs used (min. 3 to max. 10), star- 
ting state, goal state, single- or three-player, and whether or not help-text and 
popup are used. Each “Tower of Hanoi” game consists of three rods. While there 
exist ToH experiments with more than three rods, Curiosity IO does not offer 
more than three for the time being. “Add Game” adds a new game, which can 
be deleted by pressing the “Delete” icon or edited by clicking on its entry label. 
When editing a game, popup menus and text fields can be used to alter number of 
discs, start state, goal state, single- or three-player, and whether a popup should be 
visible (see e-appendix chapter3_15_tohconfig_3). Help-text and popup-text can 
be entered via the empty text-fields. When a popup is active, it has to be closed 
by the participant, in order to engage with the experiment. This can be used in 
order to announce help-text changes or report other important information to the 
participant. 

Going back to the “Create Session” menu, various timers can be set. The 
“Single-Player Timer” is the amount of time in minutes each participant is provi- 
ded to solve a Single-Player ToH game. The timer is displayed to the participant 
during the game, even on the popup. When the timer reaches 0, the next level 
is automatically started. The “Multi-Player Timer” is the analogue time for each 
participant to solve the three-player version of “Tower of Hanoi”, also referred 
to as “Tower of Europe” (ToE). The ToE timer starts after each participant has 
provided an input. The “Give-Up Timer” is the number of minutes after which a 
“Give-Up” button appears during ToE. Since each ToE game requires three par- 
ticipants, these three players are called a “game-group”. When the “Give-Up” 
button is being pressed by any participant of a game-group, the experiment for 
the entire game-group ends. When the bot checkbox is marked, it will activate the 
“bot-system”. For the bot-system to work, “Start Wait” or “In Game Wait” have 
to be filled with an integer. When the ToH experiment comes with ToE games, 
game-groups of three players are required. 
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Considering an experiment has only one experiment-group: When participants 
join such an experiment, they have to wait until two other participants have joi- 
ned. Those three participants are then put automatically in one game-group, when 
this experiment has only one experiment-group. This game-group is then auto- 
matically assigned to the first experiment-group. The next three participants who 
joined will be assigned to the second game-group. This game-group will automa- 
tically join the next experiment-group, if such an experiment-group exists. “Start 
Wait” indicates the number of minutes a player has to wait before bots will fill 
the game-group with either one or two bots. This feature was implemented so that 
MtTurks do not have to wait too long, in order to still offer ethical pay. 

Considering an experiment has more than one experiment-group: When par- 
ticipants join such an experiment, they will join, in successive order, each 
experiment-group. 

“In Game Wait” is the number of minutes a player has to wait before bots will 
fill the game-group with either one or two bots during the game. This feature was 
implemented so that participants of a game-group could still end the experiment, 
when a game-group co-player disconnected. This is a good alternative to the more 
rigid “Give-Up” button concept, as it still allows to produce decision-making data 
of the entire experiment. 

Bots only solve ToE games. Each bot is a simple algorithm using a “random- 
function”. A “random-function” is usually “pseudo-random”, as it uses various 
uncorrelated signals of a computer to produce e.g. a random integer. In this case, 
using this random-function, the algorithm chooses with a 50%-pseudo-random 
chance the small disk or with a 50%-pseudo-random chance the middle/large sized 
disk. When ToH or ToE is played with three rods, only the small or any other disk 
can be moved. Therefore, it is always a binary choice, with only three exceptions: 
when all disks are placed on one rod, only the small disk can be moved. After 
having “chosen” either small or middle/large sized disk, the algorithm chooses 
by 50%-pseudo-random chances whether or not a disk’s moving distance is either 
1 or 2 spaces. As there are only three rods, and disks cannot be moved 0 or 3 
spaces, this also is a binary choice. How these two binary choices by three agents 
lead to a single output in ToE will be explained later. 

A click on a listed experiment in the “Sessions” menu opens the session 
overview (see e-appendix chapter3_5_session). The session overview lists all par- 
ticipants. Each session overview displays four icons on the top right of the screen. 
The leftmost icon triggers an experiment as being “active”. While participants are 
able to log in to an “inactive” session, and start with a pre-survey (when the 
session is set to “open”, see below), they will be faced with a waiting screen 
when reaching an actual experiment, such as “Tower of Hanoi”. Only when an 
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experiment is set to “active” will the actual experiment start. Therefore, the “‘acti- 
ve” button does not influence pre-surveys, but only experiments. Once active, an 
experiment cannot be set to “inactive”. After the leftmost icon, the second icon 
from the left “opens” or “closes” are session with a single click. When an expe- 
riment is “closed”, participants can not join a session by entering their session 
code. The “Proceed” button at the “Join Session” screen will simple not work. 
When an experiment is “open”, participants are able to join the session by enter 
their session code, and can begin e.g. with a pre-survey. When a session is both 
“active” and “open”, participants can come and join the experiment, without the 
experimenter having to manually open and close the experiment. This feature has 
been implemented when experiments have to be conducted over the course of 
days and using e.g. Amazon Mechanical Turk, where participants might come 
from different time zones. When experiments are performed with people sitting 
in one room, the “active” feature is useful, so that participants all start with the 
experiment simultaneously after having filled out the questionnaire. In order to 
make sure that no further participants can join an experiment, it can be “locked”. 
The two buttons combined features are explained in Table 4.1 (own source) for 
better understanding. 


Table 4.1 All permutations of lock and play buttons with effect explanation 


active/inactive button | locked/unlocked button | effect 


inactive locked Default state. No participant can join 
session. No already joined participant 
can start with experiment 


inactive unlocked Participants can join session, and start 
with pre-survey. No participant can 
start with experiment 


active unlocked Participants can join session. 
Participants can start with experiment 


active locked No participant can join session. 
Already joined participants can start 
with experiment 


The third icon opens the “Game Session” options menu or “Session Configura- 
tion”. Depending on the experiment, the options menu might differ. The following 
will explain the option menu as it appears, when a “Tower of Hanoi” experiment 
is chosen (see e-appendix chapter3_16_sessionconfiguration). The “Session Con- 
figuration” can be closed via the red “X” or the “Cancel” button. “Restart Session” 
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will make each participant start over from the very beginning of the entire session, 
e.g. participants have to re-do their pre-survey. Data created is saved server-side, 
but cannot be downloaded via the options menu any longer (see below “Down- 
load Level Data or “Download Survey”). “Kick all and restart Session” will have 
the same effect as “Restart Session’, but all participants will have to rejoin the 
session via the “Join Session” screen. “Clone Session” will make create an iden- 
tical copy of a session in the “Game Sessions” list. The new session will appear 
with the original session name followed by “_clone”. This comes in handy, when 
a session is complex and takes much time to create. This way multiple clones 
with e.g. slightly altered timer times can be tested, before a main experiment is 
conducted. In order to do so, a session can be edited. This can be done via “Edit 
Session” in the “Session Configuration” menu. A session cannot be edited as soon 
as it was set “active”. When an experimenter wants to edit an already “active” ses- 
sion, the session can be cloned first, and then edited. Any session can be deleted 
by pressing “Delete Session”. A popup will then ask “Are you sure you want to 
delete this session?”. In order to remove the session from the “Game Sessions” 
list, this action has to be confirmed by pressing “Proceed”. Choosing “Download 
Level Data” will download a.csv file with the experiment’s raw data, which dif- 
fers from game to game. When a “Tower of Hanoi” session included ToH and 
ToE, a single- and multiplayer.csv will separate raw data from one-player and 
three-player games. “Download Survey” will download a.csv file with the ses- 
sions’ questionnaire raw data, which differs according to surveys’ structure and 
content. 

During an experiment, each participant can be monitored “live”. The “Game 
Session” overview displays the participant ID, the current level or stage the par- 
ticipant is in, the total playing time, and the playing time in the current stage. 
Clicking on a listed participant during or after the experiment, will display 
three different icons (see e-appendix chapter3_6_session-user-options). Choosing 
the leftmost icon displays survey answers, which can also be monitored “live” 
during the experiment (see e-appendix chapter3_7_user-options_1). The center 
icon opens the experiment monitoring tool, where each input of the participant 
during the experiment is listed, and can also be observed “live” during the experi- 
ment (see e-appendix chapter3_8_user-options_2). The rightmost icon removes a 
participant from the experiment. The session overview also displays an icon “Add- 
BotGroup” when the experiment is of type “Tower of Hanoi” (see e-appendix 
chapter3_9_session-bot). A “bot” is a simple algorithm, which randomly chooses 
inputs, as explained before. Clicking this icon adds a group of three bots, which 
will solve any ToE game. This feature was added to receive data about how many 
steps are required to solve “Tower of Europe” when the agents acts randomly. 


4.1 Development and Materials 85 


This concludes the main menu’s “Sessions” part. The “Level editor” is only 
used for “Flag Run” and “Dynamic Flag Run” games. Since this thesis builds 
upon raw data from the experiment “Tower of Hanoi”, this main menu option is 
skipped. 

The next main menu option is the “Survey Editor”. Choosing “Survey Editor” 
leads to a list of all created surveys (see e-appendix chapter3_17_surveylist). The 
list displays the survey’s name, date of creation, last date being modified and last 
date being used. Pressing the red “X” icon on the top right will lead back to the 
main menu. Choosing the icon “New Survey” will open up the “Survey Editor” 
screen (see e-appendix chapter3_18_surveyeditor_1). 

The “Survey Editor” can be closed via the red “X” or being closed and saved 
via the “Save” button. By default, a single question already exists. Each question 
is listed in the “Survey Editor”. Each question can be assigned to a group. Groups 
can be created, edited and deleted by clicking the “Groups” button. The “Groups” 
feature has yet only be tested with the “Dynamic Flag Run” game, and will not be 
explained in further detail. The listed questions can be ordered with the grey arrow 
buttons, and can be deleted with the trash-bin button. Pressing “New Questions” 
adds a new question. A question can be designed with right-hand side features. 
A question can be of type “One Choice”, “Multiple Choice”, “Scale” or “Free 
Text”. 

Each question of type “One Choice” and “Multiple Choice” can be formulated 
via a text-field and can consist of multiple answers. Pressing “New Answer” will 
make a text field appear, where the answer can be formulated. Order of answers 
can be rearranged via two direction buttons. Special features to an answer can be 
added. Those features have yet only be used in the “Dynamic Flag Run” game, 
and its description is to be skipped here. An answer can also be deleted via the 
trash-bin symbol. 

Questions of type “Scale” only have one answer. Its lower and upper end, with 
according description, can be modified via text-fields. An example would be “very 
low, 1, 10, very high” (see e-appendix chapter3_19_surveyeditor_2). Questions of 
type “Free Text” will automatically provide a text-field for each participant. 

This concludes the main menu’s “Survey Editor” part and the entire options 
available to the experimenter via Curiosity IO admin panel. In the following, a 
“Tower of Hanoi” experiment from the perspective of a participant is shown, in 
order to discuss their features and functionality. 
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4.1.5 “Tower of Hanoi” Example Session 


The session consists of a pre-survey, a “Tower of Hanoi” experiment and an after- 
survey. The experiment consists of one experiment-group, consisting of one game- 
group with one human agent and two bot. The experiment will have two games. 
One classic single-player ToH game and the three-player version (ToE). 

In detail, the pre-survey and after-survey are identical questionnaires con- 
sisting of one “One Choice” question with two possible answers and are 
created using the “Survey Editor” (see e-appendix chapter3_25_phdexample_6). 
The sessions’ experiment (see e-appendix chapter3_20_phdexample_1 and 
chapter3_21_phdexample_2) is called “PhD Thesis Example Session” with ses- 
sion code “phd”. Sex and age is chosen to be a mandatory choice for each 
participant. ToH and ToE timers are set to 1 minute. The bot was activated, and 
its activation or waiting time set to 1 minute before the experiment, and set to 1 
minute during the experiment. The described surveys, named “Done_Before’, are 
chosen to be pre- and after-surveys. There is only one experiment-group (see e- 
appendix chapter3_26_phdexample_7). The experiment-group holds two games, 


with three discs, starting rod being the leftmost rod (Start == 1). The goal rod 
of the ToH game is set to be the middle rod (Goal == 2), and the goal rod 
of the ToE game is set to be the right rod (Goal == 3). Both games have a 


help text and a popup (see e-appendix chapter3_22_phdexample_3). Help-texts 
and popup context were set to be different for this example (see e-appendix 
chapter3_23_phdexample_4 and chapter3_24 phdexample_S). 

The session is set to be “inactive” and “open”. The participant 
types in the session code “phd” to join the session (see e-appendix 
chapter3_27_joinsession). After that the participant provides sex and age (see e- 
appendix chapter3_28_joinsession) and is immediately brought to the pre-survey 
after submitting these details (see e-appendix chapter3_29_phdexample_10). 
Upon having provided an answer, the participant is now facing the “wai- 
ting screen” (see e-appendix chapter3_30_phdexample_11), as the session’s 
experiment is set to “inactive”. The experimenter sets the experiment to 
“active”, which also automatically “closes” the session (see e-appendix 
chapter3_31_phdexample_12). The participant is still only seeing the “waiting 
screen”, since two more participants have to join in order to form one game-group. 
The “Tower of Hanoi” experiment groups participants at the very beginning, even 
before a ToH game, when at least one ToE game is part of the experiment. 
Thus, it is defined from the beginning, who will face who in the second ToE 
game. After 1 minute a bot called “10000” joins the game-group. After ano- 
ther minute a second bot called “10001” joins the game-group (see e-appendix 
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chapter3_32_phdexample_13). At this moment, three agents are part of the game- 
group, starting the experiment immediately. The participant no longer sees the 
“waiting screen” but is looking at the instructions popup, with the text “This 
is the popup.”, a popup “OK” button to close the popup, the timer displayed 
on the popup, and the “helptext label” next to the popup (see e-appendix 
chapter3_33_phdexample_14). After closing the popup by pressing “OK”, the 
participant can see the actual “Tower of Hanoi” game: a caption titled “Tower of 
Hanoi’, with the timer now displayed below, three rods placed on platforms, a 
red marked goal rod with “Goal” written below to also address colorblind par- 
ticipants, and the three disks on the left rod in start-state setup (see e-appendix 
chapter3_34_phdexample_15). The participant has to press the small disk in order 
to make the “Steps” buttons, “Direction” buttons and “GO” button appear (see 
e-appendix chapter3_35_phdexample_16). After having chosen number of steps 
and direction, the participant can confirm the selection made by pressing “GO”, 
upon which the resulting state will be displayed. When the participant mana- 
ges to solve the game by positioning the three disks onto the goal rod or when 
the timer runs out, the “Level Completed” screen is displayed (see e-appendix 
chapter3_36_phdexample_17). This screen can be skipped by pressing “Next 
Level”—however the screen is automatically skipped when the participant rea- 
ched the “Level Completed” screen not by solving the game but because the 
timer ran out. After the ToH game the participant now looks at the ToE game, 
with an instructions popup “This is the second popup”, a “OK” button to close 
the popup and the “helptext label” with “This is the helptext label with different 
information now.” written on it (see e-appendix chapter3_37_phdexample_18). 
No timer is displayed, since the timer starts after all three agents of the same 
game-group have to provide an input first during a ToE game. This is done as 
agents of the same game-group might have to wait for their co-agents to reach 
the ToE game; e.g. when two agents are still playing the first ToH game and 
the third agent has already reached the second game (ToE), and provides an input 
selection (disk/number of steps/direction, and pressing “GO”’), this agent will face 
the “Waiting Screen”, until all members of the same game-group have provided 
an input selection. After closing the popup, the participant can choose an input 
just like in the first game. This time, the goal rod is the very right one (see 
e-appendix chapter3_38_phdexample_19). After confirming disk, steps and direc- 
tion selection, the timer starts (see e-appendix chapter3_39_phdexample_20). The 
output is the product of the one human and of the two bot agents input selecti- 
ons. After the ToE stage, no “Next Level” screen is displayed, as it was the final 
game. Instead, the after-survey is shown, which in this example, is identical to 
the pre-survey (see e-appendix chapter3_40_phdexample_21). When completing 
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the after-survey, the “Thank You” screen with the participant ID is displayed (see 
e-appendix chapter3_41_phdexample_22). MTurks for example are instructed to 
provide the experimenter with this ID, so that the experimenter can check whether 
or not the MTurk has actually finished the session. 

This concludes the example experiment. In the next sub-chapter, data output 
is described and explained. 


4.1.6 Example Session Data Output 


Using the options icon in the “Game Session” menu, and pressing “Download Level 
Data” and “Download Survey” several.csv files are downloaded (see e-appendix 
chapter3_42_phdexample_23): “PhD_Thesis_Example_Session_single_player” 
(ToH.csv), which contains raw data about all TOH games that were part of the “Tower 
of Hanoi” experiment. “PhD_Thesis_Example_Session_three_player” (ToE.csv), 
which contains raw data about all ToE games that were part of the “Tower of Hanoi” 
experiment. “PhD_Thesis_Example_Session_survey” (Survey.csv), which contains 
raw data about all surveys that were part of the session. For more efficient statistical 
analysis of the thesis specific hypotheses, two additional.csv files were added after 
having conducted the main experiment. Tables explaining their variables are added 
to the appendix, being referred to as “Master.csv” and “Progress.csv” (see annex 5 
and annex 6). The content of Progress.csv and Master.csv is not explained in further 
detail in this chapter, as their content is explained by the description of independent 
and dependent variables. 

Survey.csv lists several raw data (see e-appendix chapter3_43_phdexample_24), 
all summed up and explained in the according table in the appendix (see annex 2). 
All types of raw data being saved in Survey.csv, an explanation of its meaning, and 
how the raw data looks like in the example Survey.csv output file. 

ToH.csv lists several raw data all summed up and explained in the according 
table in the appendix (see annex 3). The table lists all types of raw data being 
saved in ToH.csv, an explanation of its meaning, and how the raw data looks like 
in the example ToH.csv output file. Some raw data will be explained in greater 
detail in the following sub-chapter. 

ToE.csv lists several raw data all summed up and explained in the according 
table in the appendix (see annex 4). The table lists all types of raw data being 
saved in ToE.csv, an explanation of its meaning, and how the raw data looks 
like in the example ToE.csv output file. Long raw data names were cut to save 
table space. Some raw data will be explained in greater detail in the following 
sub-chapter. 
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Tables in the appendix (see annex 2—6) show all raw data being exported to.csv 
format. The next sub-chapter will explain listed data in more detail, such as how 
data is created—this is supported by data examples derived from the Curiosity IO 
example session described in the former chapters. 


4.1.7 Response Time and Input 


Response time creation differs for human and bot agents. While response times 
for bot agents are simulated, as their input order is randomly assigned, response 
times by humans are always server-dependent. Due to technical issues response 
times are not reliable enough to make precise statistical analyses. Response times 
are therefore disregarded. 


4.1.8 States Derived from State-Space 


Data such as “start_state” in ToH.csv and ToE.csv require a state-space of the 
experiment. When including operators, the state-space between ToH and ToE dif- 
fers, as the former only requires a single action to produce an operator, and the 
latter requires three actions to produce an operator. However, the resulting output 
states of ToE are identical; if the three actions which produce an operator in ToE 
are ignored or modelled as being “intrinsic” to the operator, ToH and ToE state 
spaces are isomorphic. 

Figure 4.3 shows the state-space of both ToH and ToE (Knoblock, 2000, p. 3), 
with integers added to each knot. It consists of 27 knots, representing all possible 
states, which are visually represented in the “Tower of Hanoi” experiment. It 
also contains directed, double-headed graphs, representing all possible transitions 
between states by the according operator. At 24 knots three different operators 
exist. This is not the case in all possible “start” states (not to be confused with 
start_state), which are states 1, 8 and 15. Here, only two operators exist. From 
each state there exists a path of seven operators leading to states 1, 8 and 15. 
Since only these three states can be set to goal states, there always exists an ideal 
path of seven operators towards the goal state. 
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Figure 4.3 Tower of Hanoi state space model with their according integers. Source 
Knoblock, 2000, p. 3, red integers added by author 


4.1.9 Move States 


In ToE.csv the “move_state” data is an integer from 1 to 64 referring to all pos- 
sible three-player action combinations leading to an operator (see e-appendix 
chapter3_Algorithm-States_new_appendix_44). An action consists of a chosen 
disk and number of steps. Chosen direction is only important when assigning 
a certain type of logic, which will be explained in a later sub-chapter. 

In states 1, 8 and 15 only the small disk can be chosen. In any other state, 
agents can either choose the small disk or one of the remaining larger disks. With 
three rods, there is not state where an agent can choose between more than two 
disks in total. Therefore, each agent is confronted with a binary choice when 
deciding which disk to move—exceptions are state 1, 8, and 15. Each agents can 
choose between 1 or 2 steps, at any state, making it a binary choice. Therefore, 
an individual action can be described by two symbols. “S” representing “Small 
Disk”, “M” representing “Medium or Large Disk”, | representing “One Step” and 
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2 representing “Two Steps”. Any operator consists of three actions. The opera- 
tor being referred to by “move_state” 1 is “S1, S1, S1”, meaning that all three 
agents have chosen to move the small disk one step. Another example would be 
“move_state” 55, with actions “M1, S2, M2”. Here, the first action selection was 
“Move medium or large disk one space”, the second action selection was “Move 
small disk two spaces”, and the third action selection was “Move medium or large 
disk two spaces”. 


4.1.10 Operator Output Function 


Each set of three actions lead to one specific operator (see e-appendix 
chapter3_Algorithm-States_new_appendix_44). The idea of ToE was that each 
agent held control over the game, and that the direction buttons were not influ- 
encing the game at all. The order of information or the order of actions matters 
for calculating the operator. It was important that no participant would be able to 
gain advantage over others, meaning that the algorithm was implemented in such 
a way that no agent would be able to gain more control than other players. This 
was achieved by restricting communication. As the order of information matters, 
no agent can be sure at which position its action will be listed. The algorithm 
was also built in such a way that it can be used for more complex games or 
state-spaces than “Tower of Hanoi”, e.g. a 4-rod version of “Tower of Hanoi”. 
The ruleset for the algorithm is derived from a meta-logic and adjusted to the 
experiment’s degree of complexity. How the algorithm (function) determines the 
operator (output) by three actions (input) is now explained. 

Three inputs are received, e.g. “S1, M2, S2” listed here in chronological order. 
Only the first two inputs are regarded, and only checked for “choice of disk”, 
e.g. in this case “S, M”. By doing so “The Decider” and “Direction-Deciders” are 
obtained. The Decider’s action indicates which disk is being moved and how far 
it is being moved. Table 4.2 (own source) shows all possible permutations. 

By example, the input “S1, M2, S2” leads to “S1” being the “Decider” action, 
meaning that the small disk will be moved one space. Now, the direction has 
to be obtained. In order to do so, inputs by the “Direction-Deciders” have to be 
regarded. Direction-Deciders are the two agents who are both not the “Decider”, 
e.g. in this example, the 2™4 and 3™ actions are Direction-Decider actions, as the 
1% action is regarded as Decider action. By doing so, each agents has an influence 
on the game, with nearly eliminating the chance of any agent using a strategy to 
always be the Decider or always be the Direction-Decider, also never holding 
“full power” over disk-, steps- and direction-selection. 
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Table 4.2 All possible input combinations leading to resulting ,,direction-deciders“ 


Possible “choice of disk” | Resulting “Decider” action | Resulting 

input (first two, by time) “Direction-Decider’ actions 
S,S 24 action committed 1t & 3" action com 

S,M 1% action committed 2nd & 3° action com 

M,S 3" action committed 15t & 294 action com 

M, M 294 action committed 15t & 3" action com 


In order to define in which direction a disk moves, the two Direction-Decider 
inputs are regarded. Now the inputs are only checked for “choice of steps”, in this 
example “M2, S2” are the Direction-Decider inputs, and the two inputs checked 
are “2, 2”. The Direction-Deciders’ actions indirectly indicate in which direction 
the disk is moved, as the Decider’s Choice of Range is also considered. Table 4.3 
(own source) shows all possible permutations that decide the direction of the disk. 

By mentioned example, Direction-Deciders input “2, 2” and Deciders input 
“1” leads to the direction of the disk being “left”. The resulting operator equals 
“move_state” 29, where the Small Disk is being moved | step to the left. 


Table 4.3 All possible input combinations of direction-deciders leading to direction of disk 


Possible “choice of steps” input | Possible “choice of steps” input | Resulting direction 
by Direction-Deciders by Decider 
11 1 left 
2 right 
12 1 right 
2 left 
21 1 right 
2 left 
22 1 left 
2 right 


The algorithm to determine the operator is purposefully more complex than 
required to fairly share control, in order to being used for more complex state- 
spaces as well. Since an operator can lead to an illegal state, such as a bigger 
disk lying on top of a smaller disk, the next sub-chapter will explain, how an 
operator leads to the resulting output state, which can be observed by the agents. 
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For this, a more fundamental definition of a “Tower of Hanoi” game is provided, 
and differences in handling illegal moves in a ToH and ToE game are shown. 


4.1.11 State Output Function 


When designing the ToE game, several choices had to made, how closely related 
ToE should be to ToH. The core idea designing ToE was to “still play a Tower 
of Hanoi” game. Therefore, the core rules that make a game a “Tower of Hanoi 
game” had to be defined. Those “axioms” of ToH are: 


— A move is a state, which differs from the last state 
— No state may show a larger disk lying on a smaller disk 
— The game consists of three rods 


The first axiom leads to the impossibility to move a disk and bring it back to 
the original position. The second axiom never allows a move resulting in some 
state, where a bigger disk lies on top of a smaller disk. The third axiom limits the 
games complexity and state-space greatly. 

In addition to these axioms, each agents of ToE was supposed to have an 
influence on the game, and this control should be fairly distributed. However, 
in some situations playing ToE, individual, legal moves, can result in collective 
illegal moves. In the ToH game what defines an illegal move differs from a ToE 
game, since in the latter the direction-buttons do not influence the game, and the 
game is played alone. 

During a ToH game, pressing the “GO” button with number of steps but not 
direction being chosen, results in a “No Direction” error message (see e-appendix 
chapter3_45_nodirection_error). During a ToH game, trying to move a bigger disk 
onto a smaller disk results in a “Wrong Move” error message (see e-appendix 
chapter3_46_wrongmove_error). 

During a ToE game, pressing the “GO” button with number of steps but not 
direction being chosen, does not result in an error message. This is because such 
an error message can be regarded as deception. As the direction buttons do not 
influence the game, such an error message would state that the direction but- 
tons have to be selected, implicitly saying that they actually do influence the 
game. Therefore, no such error message will pop up when no direction buttons 
are chosen. During a ToE game, when the resulting operator, consisting of three 
individual actions, would result in an illegal move, axiom number two would 
be violated. Following the meta-logic of the algorithm, such an operator will 
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be “corrected”. Ignoring the more complex meta-logic, the resulting solution is 
that a “M” disk will simply “follow its direction” until a rod is being reached, 
where it can be placed. This must not be the original rod, as this would vio- 
late axiom one. Therefore, there does not exist any error message for this case. 
This approach differs from the legacy zTree version, where in one instance such 
an error message was displayed to the ToE agents—however, this was due to a 
wrong implementation of the algorithm; another reason for discarding the legacy 
experiment. 

In both ToH and ToE games, disks can jump edges. In other words, when the 
small disk is moved two steps to the right, the resulting state will always equal 
if the disk was moved one step to the left. For example, when starting the game 
(ToH and ToE) in state 1, an operator equal to “S2 right” will result in state 2 
just as an operator equal to “S1 left”. No error message, for both ToH and ToE 
will popup, when a disk is moved “out-of-bounds”’. In other words, there are “no 
borders” between the left rod and the right rod, as the graphical representation of 
the “Tower of Hanoi” experiment might suggest. This feature was introduced to 
make the ToH more similar to the ToE game, as the latter requires there to be “no 
borders” in order to make the algorithm work. A similar ruleset was used in the 
“Flag Run” game, as this experiments rests on the same meta-logic. 

An example showing the correction of an illegal move during a ToE game is 
provided in the following. For instance, consider state 17 is reached and the goal 
rod was the center rod. The ideal ToH operator would be “M1 right” or “M2 
left” applying “no borders” logic. During a ToE game, all three agents could use 
their single player logic and choose “M1, M1, M1”, which is “move_state” 43, 
and results in the middle disk being moved one step to the left. The middle disk 
would jump edges from the left rod and land on the right rod, counting as “one 
step to the left”. This would result in an illegal state, as the small disks rests on 
the right rod. The medium sized disk therefore goes left one more step, landing 
on the center rod, where it may “legally” be positioned. This results in state 16, 
ultimately desired by the three agents. Therefore, the middle or large sized disks 
will always land on the only legal spot, when being chosen. This is not because 
the algorithm was designed in that way, but because the “playing field”, being 
just three rods, is so limited that it may look this way. The third axiom works 
in favor for the agents. This is because when all three agents agree on one disk 
being moved, such as “Sx;, Sx2, Sx3” or “Mx, Mx2, Mx3”, the disk agreed upon 
will always be moved. If that chosen disk has to be selected in order to follow 
the most efficient path, and if that disk was a medium or large sized disk, it will 
always result in the ideal state, disregarding the choice of steps. This is the reason 
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why sticking to the ToH logic during a ToE game, will outperform randomness, 
as will be explained later. 

Individual inputs can differ from the collective operator, as being shown in the 
following example. Consider state 14 being reached, the center rod being the goal 
rod. The ideal single player solution would be “S1 right” or “S2 left” applying 
the “no borders” logic. During a ToE game, all three agents could use their single 
player logic and choose “S1, S1, S1”, assuming no one had figured out that disks 
jump edges. This action input equals “move_state” 1, and results in the small disk 
being moved one step to the left. The small disk would jump edges from the left 
rod and land on the right rod, counting as “one step to the left”. This would result 
in state 16, which was probably neither desired nor expected by the three agents. 

It is clear by the two former examples that during a ToE game, several “logi- 
cal” perspectives exist and that a ToE operator does not necessarily equal what 
an agent expected to happen. Whether or not an individual action equals some 
logic or whether or not the individual action results in some expected state is also 
expressed by raw data. The following sub-chapter will explain logic and expected 
state data. 


4.1.12 Logic and Expected States 


The “Flag Run” experiment had shown that participants developed different stra- 
tegies, all stemming from a different logic, but all effective strategies relied on 
having obtained “true rules” governing the game (Strunz & Chlupsa, 2019). 
During the “Flag Run” game, some participants even came up with effective 
strategies not anticipated by the experimenters; still, they also had to build their 
strategy upon having figured out a “true rule”. Such a “true rule” was that during 
the “Flag Run” game, the game-piece can jump edges or that the direction buttons 
did not influence the game at all. These two “true rules” are also implemented in 
the “Tower of Hanoi” experiment. During a ToH or ToE experiment, the disks can 
jump edges and the ToE game is not influenced by the direction buttons. Another 
“true rule” for ToE is that the disk, steps and direction are decided by all three 
agents as a collective, calculated by a complex algorithm, making it very unlikely 
or even impossible for one individual agent to control the game alone. However, 
participants can beat randomness by sticking to the ToH logic, as three actions 
that agree on a disk will result in this disk being moved. 

Given the two examples from the former sub-chapter, it has already been des- 
cribed that different “logical” approaches can be identified, by which participants 
act. As the results of the “Flag Run” experiment had shown, most participants 
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who effectively solved the NRP stick with a single form of logic, as soon as feed- 
back confirms this logic to be effective. Again, consider state 14 being reached, 
the center rod being the goal rod. The ideal single player solution would be “S1 
right” or “S2 left” applying the “no borders” logic. When no agent had figured out 
during the ToH game that pieces “jump edges” and that there was “no borders” 
between the left and right rod, the “S2 left” solution might seem illogical. This 
is because the agent is only locally informed and is missing information about 
the true nature of the game, analogue to the island Gedankenexperiment. This 
might lead to all agents providing inputs “S1, S1, S1”, direction “right”, leading 
to state 16, which is unfavorable. However, it might just be that one agent had 
figured out the “no border” logic, applying “S2”, direction “left”. If the agents 
were able to inspect their co-agents’ inputs, the two remaining agents would find 
“S2”, direction “left” as being an illogical move. As a reminder, during a ToE 
game, the directions chosen by the players does not have any effect whatsoever. 
Assuming one agent chooses “S2”, and two agents choose “S1”, depending on 
the order of information, three different “move_states” are possible: move_state 2 
(“S2, S1, S1”) moves the small disk one step to the right, leading to the desired 
goal state 15; move_state 5 (“S1, S2, S1”) moves the small disk two steps to the 
right, leading to the unfavorable state 16; move_state 17 (“S1, S1, S2”) moves 
the small disk one step to the right, leading to the desired goal state 15. 
Assuming move_state 2 and 17 occurred, could lead to a confirmation of a 
locally logical, but globally illogical strategy. The two agents who had chosen 
input “S1”, direction “right”, who had not figured out the true rule of “disks 
jumping edges” would find the input “S2”, direction “left”, as being an illogical 
input. From the perspective of the single agent, who used the “no border” strat- 
egy, both inputs are logical solutions. However, all logics are globally imperfect 
as the direction of disks cannot be influenced by pressing the direction buttons. 
This example shows the analogy to the island Gedankenexperiment. Depending on 
the individual experience, individual logics are applied, which can be either con- 
firmed or denied by environmental conditions. However, since the agents decide 
collectively, negative feedback cannot be assigned to a wrong strategy with cer- 
tainty. Group performance still can benefit from the “traditional strategy”, being 
acquired in the first ToH game, since agreement on the optimal disk outperforms 
randomness. While the rules of the game do not change, the goal rod’s position, 
simulating an environmental condition, can have a major influence on the strat- 
egy’s performance. A game-group can solve a ToE game optimally in 7 moves by 
applying the most intuitive “border” ToH logic, when the goal rod is the right rod 
(goal == 3). Since agents cannot communicate, even having obtained the excel 
sheet with all 64 input permutations, agents could not control the game fully, as 
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they had to communicate their input order. With the goal rod being the center 
rod, the most intuitive “border” ToH logic will fail to solve the game in 7 moves, 
and would lead to—probably—unfavorable and unexpected results. This environ- 
mental condition (goal == 2) will have an impact on any strategy’s performance. 
Depending on the game setup, feedback, valence weighting bias, routine strength, 
logic applied and intrinsic motives, a human agent will alter its strategy or remain 
using the strategy. It is therefore important to cover as many “logics” as possible, 
in order to make sense of the agent’s strategy, and to measure if an input lead to 
an “expected” output state. 

All different logic models and “expected” output states are saved as a Boolean in 
ToH.csv and ToE.csv (see e-appendix chapter3_dir_or_nodir_states_appendix_47 
and chapter3_exp_states_appendix_48). With their necessity being explained, the 
following will explain which models of logic are saved, how they are created and 
how “expected” and “unexpected” states are distinguished. 

During a single player ToH game, two different “logic models” are being mea- 
sured. These two logic models are saved in ToH.csv as a Boolean with “logic” 
and “no_border_logic”. When the Boolean of the according “logic model” equals 
1, the action is regarded as being equal to this “logic model”. When the Boolean 
equals 0, the action is regarded as not being equal to this “logic model”. 

In ToH an action is saved as being equal to “logic”, with Boolean equal to 1, 
when the output state follows the ideal path, without the disk “jumping edges”. 
This path depends on “start” and “goal” rod position, “start_state”, “input” and 
“direction”. In ToH an action is saved as being equal to “no_border_logic”, with 
Boolean equal to 1, when the output state follows the ideal path, with the disk 
“jumping edges”. The ideal path is the path were the goal is being reached in 7 
moves. When the playing piece deviates from the ideal path, both “logic models” 
assign a Boolean of 0. When the “start_state” is a state, which is the result of 
such a deviation, the following move is given a Boolean of 1, when it follows 
the ideal parh again. The latter is to be shown by example in the following. All 
possible configurations that lead to “logic” or “no_border_logic” are listed in the 
electronic appendix (see e-appendix chapter3_dir_or_nodir_states_appendix_47). 

For instance, consider a ToH game with the starting rod being the left rod, the 
goal rod being the right rod and three disks. In this case, the ideal path would lead 
to “state” 2, and the ideal operator for “logic == 1” would be “S2 right”, and 
the ideal operator for “no_border_logic == 1” would be “S1 left”. Studies about 
“Tower of Hanoi” performance showed that the first move is often the move, 
deviating from the ideal path the most. Assuming the agent deviates from the 
ideal path at the first move, choosing action “S1 right”. State 9 is reached and 
Booleans for both “logic models” are then assigned a value of 0. The ideal path 
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would now lead to “state” 2, and the ideal operator for “logic == 1” would be 
“S1 right”. Assuming the agent chooses to correct its deviation by “S1 right’, 
state 2 is reached and a Boolean of 1 is assigned to “logic”, even though the 
“start_state” was outside of the original ideal path. A Boolean of 0 is assigned to 
“no_border_logic’’, since the disk did not jump edges. 

During a three player ToE game, where the direction buttons do not influ- 
ence the disks, several different “logic models” are being measured. Players can 
also choose to confirm an input without having pressed a direction button—being 
saved as “n” or “none”. ToE logic models are saved in ToE.csv with a Boolean 
assigned to them. When the Boolean of the according “logic model” equals 1, the 
action is regarded as being equal to this “logic model”. When the Boolean equals 
O, the action is regarded as not being equal to this “logic model”. 

The ToE “logic models” are: “framed_logic”, distinguished by “dir”, “no- 
dir” and “ideal”; “no_border”, distinguished by “dir”, “nodir”, and “ideal”. Each 
“logic model” evaluates individual actions, and not the collective operator. Writ- 
ten in front of each “logic model” attribute, “‘first_player”, “second_player” or 
“third_player” refer to which agent’s action is considered. 

The logic model “frame_logic” with “dir”, short for “direction”, is isomorphic 
to ToH “logic”. When participants play one or multiple games of ToH, routine 
strength by feedback leads to human agents probably “carrying” the single player 
ToH logic into the three player ToE domain. 

A Boolean of 1 is assigned to “framed_logic” with “nodir”, short for “no 
direction”, when an action equals “framed_logic_dir’, disregarding direction. For 
example, consider a ToE game with the starting rod being the left rod, the goal 
rod being the right rod and three disks. In this case, the ideal path would lead to 
“state” 2, and the action for “framed_logic_dir == 1” would be “S2 right”, and 
the action for “framed_logic_nodir == 1” would be “S2 left” or “S2 right” or “S2 
none”. This logic is measured because during the legacy experiment, participants 
had orally reported to the experimenter that they kept on pressing direction but- 
tons, even though they were informed that they did not influence the game. They 
remained pressing the direction buttons arbitrarily, either always or frequently, 
without using “framed_logic_dir’”. They reported to do so to “stay in rhythm” or 
“out of routine” or “because they did not feel like ignoring the direction buttons 
altogether”. 

A Boolean of 1 is assigned to “framed_logic” with “ideal”, when an action 
equals “framed_logic_dir’, and direction “none” is chosen. For example, consider 
a ToE game with the starting rod being the left rod, the goal rod being the right 
rod and three disks. In this case, the ideal path would lead to “state” 2, and 
the action for “framed_logic_dir == 1” would be “S2 right”, and the action 
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for “framed_logic_ideal == 1” would be “S2 none”. This logic is measured to 
identify players, who stick with the “framed_logic” but regarded the direction 
buttons as being useless. 

The three remaining “logic models” are analogous to the three mentioned “lo- 
gic models”, however, they identify players who make the disk “jump edges”. All 
three versions of “no_border logic models” were introduced to identify players, 
who had obtained the “true rule” that “disks jump edges”. 

Assuming a ToE game with the starting rod being the left rod, the goal rod 
being the right rod and three disks. In this case, the ideal path would lead to 


“state” 2, and the action for “no_border_dir == 1” would be “S1 left’, the action 
for “no_border_nodir == 1” would be “S1 left” or “S1 right” or “S1 none”, and 
the action for “no_border_ideal == 1” would be “S1 none”. 


The following Table 4.4 (own source) will list all mentioned examples of ToE 
“logic models”. 


Table 4.4 All possible input combinations resulting in according logic category booleans 


ToE, with starting rod being left rod, goal rod being right rod, first move 


Individual input Logic model Boolean 
S2 right framed_logic_dir/nodir/ideal | 1/1/0 
no_border_dir/nodir/ideal 0/0/0 
S2 left framed_logic_dir/nodir/ideal | 0/1/0 
no_border_dir/nodir/ideal 0/0/0 
S2 none framed_logic_dir/nodir/ideal | 0/1/1 
no_border_dir/nodir/ideal 0/0/0 
S1 right framed_logic_dir/nodir/ideal | 0/0/0 
no_border_dir/nodir/ideal 0/1/0 
S1 left framed_logic_dir/nodir/ideal | 0/0/0 
no_border_dir/nodir/ideal 1/1/0 
S1 none framed_logic_dir/nodir/ideal | 0/0/0 
no_border_dir/nodir/ideal O/1/1 


With models of logic explained, i.e. how they are created, saved and for 
which purpose they were introduced, “expected” and “unexpected” states are to 
be discussed in the following. 

In a ToE game unexpected outputs can occur. For this reason, ToE.csv assigns 
a Boolean to each individual agent’s action, depending on the output state, indi- 
cating whether (1) or not (0) the individual action lead to the expected outcome. 
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This measurement was introduced to obtain data about individual decision-making 
correlating with feedback or in other words, to obtain information about whether 
or not participants alter their strategy when they “do not see what they expec- 
ted” or stick to a strategy when they “do see what they expected”. In order to 
make an assumption about, whether or not some output state was expected by 
a participant, several data attributes have to be known: start_state, output state, 
input, and direction. A data sheet lists all possible configurations (see e-appendix 
chapter3_exp_states_appendix_48). 

Several “expectation models” exist, with each assigned a Boolean: FL_exp_dir, 
short for “framed logic expectation considering direction”; FL_exp_ideal, short 
for “framed logic expectation not considering direction”; NB_exp_dir, short for 
“no borders logic expectation considering direction” and NB_exp_ideal, short for 
“no border logic expectation not considering direction”. 

For instance, consider a ToE game with the starting state being equal to state 1, 
as modelled in the state-space (see Figure 4.4). Some game-group operator results 
in the output state equal to state 2, as modelled in the state-space. An input equal 
to “S2 right” leads to Boolean | being assigned to “FL_exp_dir”. An input equal 
to “S2 none” leads to Boolean 1 being assigned to “FL_exp_ideal’”. Both “expec- 
tation models” assume the “framed logic model”. When the “no border logical 
model” is assumed, two other “expectation models” are distinguished. Conside- 
ring the same example start and output state by some game-group operator, an 
input equal to “S1 left” leads to Boolean 1 being assigned to “NB_exp_dir’. An 
input equal to “S1 none” leads to Boolean 1 being assigned to “NB_exp_ideal”. 

How logic models and expectation models are to be interpreted depends on 
many factors. To reduce complexity, a heuristic approach is being taken: it is 
assumed that every input is being chosen deliberately. Therefore, with “expected 
models”, there does not exist an “exp_nodir’ distinction. This, of course, excludes 
errors and deviations stemming from misclicking or non-deliberate inputs. It also 
makes data of “expectation models” meaningless, when the participant expected 
“nothing” or just “randomly” provided inputs. It is assumed such deviations occur 
rare enough, so that their number have a neglectable impact on the overall expe- 
riment. However, when a participant shows many “framed_logic_nodir” actions, 
and next to no “dir” or “ideal” data, it can be assumed that data of “expectation 
models” will not be very meaningful, as they assume the direction to be selected 
with purpose. 

Building upon Rubinstein (2007) an action can be considered “reasonless”, 
when response times are short, and also by analyzing the actions’ logic and expec- 
ted states, i.e. a set of actions that jump between different logic states, and has 
an expected state outcome similar to the randomizer bot results, can efficiently 
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regarded as being “reasonless”. An action can be considered “instinctive” when 
response time are short, and e.g. both logic and expected states show that the 
agent still sticks with the single-player logic routine, being “framed” by its own 
mental model. While Rubinstein (2007) categorized actions intuitively, this the- 
sis follows Rubinstein’s suggestion and base the categorization of actions between 
cognitive, instinctive and reasonless with “on other sources of information’, being 
the logic and expected states data (Rubinstein, 2007, p. 1258). This also reduces 
the risk to falsely interpret agent deviations from optimal behavior, e.g. by simple 
misklicking as non-standard preferences (Cason & Plott, 2014). 

This concludes all logic and expected states models. With all raw data and 
their creation explained, the next sub-chapter will focus on the participants, who 
conducted the main experiment. 


4.2 Participants 


180 Amazon Mechanical Turk workers (MTurk) were recruited via “Amazon 
Mechanical Turk”, where online freelancers can be hired for various tasks, such 
as online questionnaires and experiments. From these 180 participants, data of 
87 MTurks could be used for statistical analysis. As indicated in Strunz & 
Chlupsa (2019), MTurks “are commonly recruited for behavioral experiments due 
to AMT’s workers pool size, low costs and being able to produce high-quality 
data fast (Buhrmester, Kwang, & Gosling, 2011)“ (p. 114). Freelancers recrui- 
ted via this platform show comparable bias and heuristic behavior as participants 
recruited by more traditional methods (Paolacci, Chandler, & Ipeirotis, 2010). 
MtTurks are mainly motivated monetary compensation (Lovett et al., 2018), such 
that realistic working conditions can be simulated with these participants, where 
thinking-time is associated with costs. 

There exist possible cultural influence on complex problem solving and adap- 
tive decision making (Güss, 2011; Giiss et al., 2012)—differences which are 
supposed to stem from different learning environments (Funke, 2014). Highly 
significant differences in non-routine problem solving performance and response 
times by country origin have been measured comparing 290 Indian, 262 US- 
American and 51 German participants via the “Flag Run” experiment (Strunz, 
2019). For this reason, all 180 MTurks were restricted to US American MTurks. 
In order to ensure that MTurks were actually human and not automatically wor- 
king machines, so called “bots”, approval rating, reflecting the MTurk’s ,,repution“ 
was set to “high levels” to ensure high quality data. High levels of MTurk reputa- 
tion are defined to be the case with an approval rating above 95% (Peer, Vosgerau, 
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& Acquisti, 2013). When a task, referred to as “HIT” is opened to US American 
freelancers with a mandatory HIT approval rate of higher than 95%, 11.126 free- 
lancers were “captured” in a study from 2015 (Stewart et al., 2015) and a more 
recent study stated there being 12.000 MTurk freelancers on average (Difallah, 
Filatova, & Ipeirotis, 2018). However, according to Difallah, Filatova & Ipei- 
rotis (2018), these numbers are extreme underestimates due to variation. When 
correcting for propensities at least 100.000 to 200.000 freelancers are actively 
working. 

Even though no differences in NPS performance were measured regarding 
sex, and only low correlation regarding age were found in Strunz & Chlupsa 
(2019), age and sex was again asked for during the login-stage, as the reflective 
cognitive state was described as being influenced by age (Liebherr, Schiebener, 
Averbeck, & Brand, 2017) and as female participants have shown to change to a 
better strategy less efficiently in experiments under feedback (Casal, DellaValle, 
Mittone, & Soraperra, 2017). 

As MtTurks’ behavior vary over the course of a 24hour day, with partici- 
pants behaving less reflective on the weekends (Arechar, Kraft-Todd, & Rand, 
2017), the final study was conducted on a regular working day, being the 6" 
of December. According to an online tracker showing hourly demographics of 
AMT Workers (Difallah et al., 2018; Paolacci et al., 2010), the most recent data 
available at the time when the experiment was conducted, showed variation of 
US American freelancers throughout the entire month of September 2019 ranging 
between 52.58% and 85.42%. The the majority of MTurks consisted of US Ame- 
ricans. Regarding sex, 49.04% female and 50.96% male US American MTurks 
participated from September 1% 2019 to September 30" 2019 according to the 
online tracker (Paolacci et al., 2010), indicating well balanced monthly sex dis- 
tribution. Dates for December, when the experiment was conducted, were not 
available at the time of research. 

Age distribution over workdays, being Monday to Friday, from the 1% of Sep- 
tember 2019 to 30" of September 2019 retrieved from the online tracking tool 
(Paolacci et al., 2010) are listed in Table 4.5. 

MTurk demographics from 2018 (Difallah et al., 2018) reported 55% of US 
female participants and 45% of US male participants. Household income for US 
MTurks were found to be below the average of the US population: with the US 
household median being “$57K” and the US MTurks household median being 
“around $47K”, and “while 26.5% of US households make more than $100K per 
year, for MTurk workers this percentage falls at 12.5%.” (Difallah et al., 2018, 
p. 4). 
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Table 4.5 Year of Birth 


distribution of MTurks on Year of Birth Percentage Age (as of 10/2019) 


workdays Mo-Fr, from 2000-2010 1,268% 9-18 
01.09.2019 to 30.09.2019. 1990-2000 34,232% 19-28 
Some daaaoqired via gp 1990 asare [23 
Paolacci et al., 2010 1970-1980 14,758% 39—48 
1960-1970 9,46% 49—48 
1950-1960 2,604% 59-58 
1940-1950 0,598% 69-78 
1910-1940 0,21% 79-108 


From 87 MTurks 29 self-reported being female and 58 self-reported being 
male, with an average of 33.16 years for both sexes. 

Actively monitoring the MTurk forums is recommended by researchers, in 
order to find out whether or not a HIT was discussed amongst the MTurks, which 
could have a negative influence on the experiment’s data quality (Cheung, Burns, 
Sinclair, & Sliter, 2017). When a pre-test of the main experiment was performed, 
minor information about the experiment was found to be shared online. A single 
participant rated the experiment as “fair” but also stated the disadvantage that one 
of his partner’s bad performance made her wait longer than necessary. Informa- 
tion being shared online cannot be avoided. For this reason, an after-survey was 
included, asking participants, whether or not they had already participated in this 
experiment before. 

As most participants are informed about playing in a group anyways, infor- 
mation being shared online was observed to be very limited, and as certainly not 
all MTurks are actively monitoring the MTurk forums, treatment diffusion effects 
are regarded as potentially low. For this reason, more transparency was regarded 
to outweigh its potential negative side-effects, and an official profile on “Tur- 
kerView” was created, where MTurks are able to retrieve information about the 
experimenter’s former payments, communication, number of rejections, approval 
response times, and number of blocked participants. 

As studies have found 40% of MTurks working with “Amazon Mechanical 
Turk” as their primary job, the practical recommendation to act as “reputable 
employers” was followed (Brawley & Pury, 2016, p. 542), and more than 45 
USD per hour was paid to MTurks on average over the course of 26 HITs. Since 
“unfair wages, and inaccurately listed time requirements were among the top five 
worst Requester behaviors” (Brawley & Pury, 2016, p. 542), calculating MTurks 
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average pay was always aimed way above US minimum wage, when experience 
in early experiments was missing. For this reason, the high average hourly pay 
was achieved. 

In conclusion, from 180 participants, data of 87 US American MTurks was ran- 
domly selected from an online pool of potential participants. How many MTurks 
can ultimately be reached is debated and dependent on the model used to appro- 
ximate it. According to literature, certainly more than 11-thousand freelancers via 
“Amazon Mechanical Turk” were reached, and numbers could extent to more than 
100-thousand. In order to being able to rely on the most recent statistical results, 
data from September 2019 was used to determine US female/male distribution, 
and age during working days. 


4.3 Procedure 


Participants were provided with in-depth instructions and a text field, where the 
according participants ID was supposed to be entered, as shown in Figure 4.4. 


READ Instructions HERE, 2-3 Minutes! (Click to expand) 


Survey link: htips //www cunosity-data com 


Provide the survey code here: 


Figure 4.4 MTurk client side view of HIT. Source own source 


Upon having clicked on “(Click to expand)” each participant was provided 
with the following instructions: 


Survey Instructions reading time: 2-3 minutes. Trouble Shooting section 
included. Make sure to read. 
Complete an online experiment consisting of 12 levels of Tower of 
Hanoi. Instructions are included ingame. Bonus pay for best 10%. 

= When experiment lasts longer than 51 minutes, submit with Worker 
ID and time played, you will be approved if you did not idle on purpose. = 
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Go to https://www.curiosity-data.;com/ and enter 1992 as “Session 
Code”. 

Please provide us with your sex and age. You will be given an ID at 
the end. Enter this ID as your Surveycode. Do not provide me with your 
worker ID. 

This experiment may easily last longer than 30 minutes. Do not start this 
HIT when you do not have enough time. 

You may have to wait up to 10 min at the beginning. 

You may have to wait up to 14 min during the experiment. 

Do not leave the game unattended. If you are kicked due to inactivity, I 
will under no circumstances approve your work. 

Check the information box on the left of your screen during the game. 
Its contents may change and are important. 

Make sure to leave this window open as you complete the survey. When 
you are finished, you will return to this page to paste the ID into the 
box. Not your worker ID. 

About me: 

I am registered on TurkerView (Ulrich Strunz), if you want to leave a 
rating. 

You can easily reach me via email, I will answer. 

My experiments are unique. Thanks for helping me out. 

Compensation: 

In case you are unable to submit in time, I offer compensation in some 
cases. My time is limited. I am also human. Please be patient in this case, 
I am working with hundreds of MTurks simultaneously, alone. Leave me a 
reminder Email in case you did not receive funds. Screenshots help, so you 
can prove your progress. 

= When experiment lasts longer than 51 minutes, submit with Worker 
ID and time played, you will be approved if you did not idle on purpose. = 

Please do not spam me with multiple emails, I will listen to your 
explanations in case something went wrong. 

Survey: 

There is a pre-survey included. Please make sure to answer it. I need to 
know if you are from USA. 

There is an after-survey included. Please make sure to answer it. I need 
to know if you have played this experiment before. 
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Experiment: 

Using a tablet or notebook will be ideal. Mobile phones might have a 
too small screen to display all information properly. 

The experiment is not bugged. It has been tested with more than 200 
participants by now. I have no influence over the setup you are using. Old 
hardware or missing drivers may result in bad latency. Check the trouble 
shooting section for more details. 

Trouble-Shooting: 

!!! Some MTurks experience problems when using Google Chrome 
since its last update. Clearing your cache might be necessary before 
starting the game. !!! 

The game has been tested with Chrome, IE, Firefox, Yandex Browsers. 
No trouble was experienced. 

In case you accidentally close your browser, just come back. Your 
experiment progress will be saved. 

Several Turkers reported a problem with the publish button not working. 
This is an AMT specific problem. The best option is to: 

1) Inform me via Email when an error occurs. 

2) Wait. Sometimes the button will function after 5 min of waiting time. 

In case the proceed button does not work: 

1) Make sure you have typed in 1992 as session code. 

2) Make sure to have chosen your SEX using the drop-down menu and 
have provided us with your AGE using integers. 

3) Since the latest Chrome update, some unsolvable (from my side) 
issues were reported, when using this Web-Browser. 


After going to https://www.curiosity-data.com/, participants had to self-report 
their sex and age. After a valid input they could start the experiment by clicking 
the “Proceed” button. The entire experiment consisted of a Tower of Hanoi/Tower 
of Europe experiment, and an after-survey. The experiment came with five 
different information conditions, represented by five experiment groups. The par- 
ticipant was assigned to one of the five experiment groups automatically by login 
order, as explained in the former chapter. Participants were also assigned auto- 
matically to game groups by login order, each game group consisting of three 
players, as explained in the former chapter. Each information condition (expe- 
riment group) consisted of 13 games or levels. The first 7 games were single 
player games. The last 6 games were multiplayer games. The first game was 
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added to give players enough time to read popup and help-text information, and 
data collected during the first game was not used for analysis. The first game 
was considered as a buffer level. The second popup showed up at game 8. The 
goal rod changed during the single player and multiplayer games. Each game was 
played with three rods, referred to as either left, center or right rod. During the 
first four single player games (buffer level included) the goal rod was set to be 
the right rod. During the last three single player games, the goal rod was set to 
be the center rod. The first three multiplayer games were played with the right 
rod being the goal rod, while the last three multiplayer games set the goal rod to 
be the center rod. All games in all information conditions were played with three 
disks. Figure 4.5 (own source) shows the entire setup. 

For each single player game, a timer of 2 minutes was preset. When the timer 
ran out a level was automatically ended, skipping to a “Level Completed” screen, 
which was also automatically closed after 1 second, having shown the next game 
screen. The single player timer automatically started as soon as the game screen 
was shown. 

For each multiplayer game a timer of 3 minutes was preset. When the timer 
ran out a level was automatically ended, skipping to a “Level Completed” screen, 
which was also automatically closed after 1 second, having shown the next game 
screen. If the last game 13 was skipped in such a way, the after-survey was 
automatically displayed. 

When a game group was not filled with three participants before the experi- 
ment started, a waiting screen appeared. After 5 min of timeframe an automated 
bot participant was added to the group. If for some reason a player left a game 
group during the game, and Curiosity IO registered this player as being disconnec- 
ted, a bot was added to the group after 10 min. This feature was implemented for 
ethical reasons, so that MTurks were still able to solve the experiment in time. 
When a game group was filled with three agents, the actual experiment started 
with the first game level. 

The first level called “Game 0” includes a popup with the following message: 


“--- Do not worry about the timer. Take your time to read the following! --- 
Your task is to solve 12 games of Tower of Hanoi. 

6 training games, and then 6 performance games. An additional game 
(this game) is added, so you can read these instructions. 

For each game a timer will be displayed. When the timer reaches zero, 
the next game will automatically start. 
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Game Discs Start Goal Type HelpText Popup 


Game 1 3 1 3 single Yes Yes Ca) 
Game 2 3 1 3 single Yes No Ca) 
Game 3 3 1 3 single Yes No a) 
Game 4 3 1 3 single Yes No oO 
Game 5 3 1 2 single Yes No (0) 
Game 6 3 1 2 single Yes No Ca) 
Game 7 3 1 2 single Yes No a) 
Game 8 3 1 3 multi Yes Yes Ca) 
Game 9 3 1 3 multi Yes No a) 
Game 10 3 1 3 multi Yes No a) 
Game11 3 1 2 multi Yes No @ 
Game12 3 1 2 multi Yes No @ 
Game13 3 1 2 multi Yes No @ 


Figure 4.5 Administrator perspective of entire experimental setup using Curiosity IO 
framework. Source own source 


Try to solve each level in as few steps as possible. The best 10% of all 
participants will win a 2.00 USD bonus (only if you provide me with the 
ID displayed at the end of the experiment, do NOT provide me with your 
worker ID). 

Your performance will not be measured during the first 6 practice games. 

Your performance will be measured during the 6 performance games. 

Important: 

1) During performance games, the timer will start AFTER your first 
move. So you can take your time reading pop-up information. 

2) Pay close attention to the instructions on the left-hand side as they 
might change. A pop-up will be displayed when additional information is 
added to the instructions, to make sure you notice the change. 

3) Every piece of information displayed is true. You can trust all written 
information.” 


Instructions displayed on the left side included the following text: 
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“Instruction Game 0 
(no performance measured) 

The objective of the puzzle is to move the entire stack to the indicated 
goal rod, either center or right rod, obeying the following simple rules: 

Only one disk can be moved at a time. 

Each move consists of taking the upper disk from one of the stacks and 
placing it on top of another stack or on an empty rod. 

No larger disk may be placed on top of a smaller disk. 

Click on the disk you want to move first. Drag-and-Drop does not work. 
After that, you will have to figure out the rest for yourself. 

With 3 disks, the puzzle can be solved in 7 moves. 

Additional information: 

No additional information so far.” 


The instruction text did not change throughout the first 7 games, except in the 
integer referring to the current level being played. 

A second popup appeared in all five conditions with starting of game 8, and 
instruction texts differed amongst conditions. All instruction texts were altered as 
follows: 


“Performance Game 7 (performance is measured)“, 


therefore, participants were informed about that their performance was now 
evaluated during the coming levels. 

Instructions texts then differed amongst the five conditions (experiment groups, 
EG) in the “Additional Information:” part. The second popups differed from the 
first popup by stating “You are now starting with 6 performance games.” and with 
exception of experiment group 1, the second popup also contained the warning 
phrase “Attention! Additional information was added to the instructions. Make 
sure to read!”. This warning was implemented in order to make sure that partici- 
pants actually read the additional information. The additional information contents 
were written in capital letters to induce disfluency, in order to enhance chances 
of “promoting more comprehensive consideration of opposing views”, as disflu- 
ency in writing style has been proven to disrupt confirmation bias (Hernandez & 
Preston, 2013, p. 178). 

Instruction text content is summarized in the following Table 4.6. 
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Table 4.6 Instruction texts for according information conditions in ,,ill-defined“ stages. 
Source own source 


EG | Game 8 instruction text “Additional Information:” Warning 


“No additional information.” No 


2 | “YOU ARE PLAYING IN A TEAM OF THREE HUMANS DURING Yes 
THE NEXT 6 GAMES. YOU ALL HAVE INFLUENCE ON THE 
MOVEMENT OF THE DISCS AND SHARE CONTROL OVER THE 
GAME ACCORDING TO HIDDEN RULES. THE RULES DO NOT 
CHANGE DURING THE NEXT 6 GAMES.” 


3 | “YOU ARE PLAYING IN A TEAM OF THREE HUMANS DURING Yes 
THE NEXT 6 GAMES. YOU ALL HAVE INFLUENCE ON THE 
MOVEMENT OF THE DISCS AND SHARE CONTROL OVER THE 
GAME ACCORDING TO HIDDEN RULES. THE RULES DO NOT 
CHANGE DURING THE NEXT 6 GAMES. 

SINCE YOU CANNOT COMMUNICATE WITH EACH OTHER, IT IS 
HIGHLY UNLIKELY FOR YOU TO FIND OUT THESE RULES.” 


4 | “YOU ARE PLAYING IN A TEAM OF THREE HUMANS DURING Yes 
THE NEXT 6 GAMES. YOU ALL HAVE INFLUENCE ON THE 
MOVEMENT OF THE DISCS AND SHARE CONTROL OVER THE 
GAME ACCORDING TO HIDDEN RULES. THE RULES DO NOT 
CHANGE DURING THE NEXT 6 GAMES. 

DURING THE NEXT 6 GAMES THE DIRECTIONAL BUTTONS DO 
NOT INFLUENCE THE GAME AT ALL. ALL THEY DO IS CHANGE 
COLOR WHEN BEING PRESSED.” 


5 | “YOU ARE PLAYING IN A TEAM OF THREE HUMANS DURING Yes 
THE NEXT 6 GAMES. YOU ALL HAVE INFLUENCE ON THE 
MOVEMENT OF THE DISCS AND SHARE CONTROL OVER THE 
GAME ACCORDING TO HIDDEN RULES. THE RULES DO NOT 
CHANGE DURING THE NEXT 6 GAMES. 

SINCE YOU CANNOT COMMUNICATE WITH EACH OTHER, IT IS 
HIGHLY UNLIKELY FOR YOU TO FIND OUT THESE RULES. 
DURING THE NEXT 6 GAMES THE DIRECTIONAL BUTTONS DO 
NOT INFLUENCE THE GAME AT ALL. ALL THEY DO IS CHANGE 
COLOR WHEN BEING PRESSED.” 


With game level 8 being reached, participants played 6 rounds of Tower of 
Europe, in identical manner as explained in the former chapter. Each experi- 
ment group contained different additional information, defining the five different 
information conditions, which are to be explained in the following. 

The first information condition (EG: 1) did not contain any further information. 
Participants were not informed about the fact that they did now share control 
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with two additional agents. This information conditions is now referred to as “no 
information condition” (N-IC). 

The second information condition (EG: 2) informed participants about them 
sharing control during all 6 performance games with two other agents in accor- 
dance to hidden rules, which will not change. This information condition is now 
referred to as “GDM information condition” (G-IC). 

The third information condition (EG: 3) informed participants about them sha- 
ring control during all 6 performance games with two other agents in accordance 
to hidden rules, which will not change. Participants also received the “discou- 
raging” information that due to a lack of communication potential these hidden 
rules will likely remain hidden. This information condition is now referred to as 
“disillusioning information condition” (D-IC). 

The fourth information condition (EG: 4) informed participants about them 
sharing control during all 6 performance games with two other agents in accor- 
dance to hidden rules, which will not change. Participants also received the 
information about the directional buttons not having any function besides chan- 
ging color when being pressed. This information condition is now referred to as 
“routine information condition” (R-IC). 

The fifth information condition (EG: 5) contained all additional informa- 
tion from G-IC, D-IC, R-IC. This information condition is now referred to as 
“combined information condition” (C-IC). 

Additional information content was displayed throughout all ToE games, and 
did not disappear or alter its contents at any moment. 

Having solved all 6 ToE games, participants had to answer an after-survey, 
simply asking “Have you done the experiment "Flag Run" before?”, which par- 
ticipants were able to answer by either choosing “Yes” or “No”, after which the 
experiment ended, and participants were provided with their ID. 

The following chapter will derive hypotheses, list dependent and independent 
variables and how data was treated. 
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Specific Research Objectives 


The experiment’s purpose was to create a decision-making domain related to a 
VUCA domain—where agents had to solve a complex problem—and to analyze 


how their behavior changed when provided different global information. Another 


major aspect of the experiment was to “train” the agents first in decision-making 
in isolation (routine-strategy), followed by randomly grouping them into a “ga- 
me” of three agents afterwards. The agents were unable to communicate, did not 
receive information about the former actions of their co-agents, but were always 


able to collectively see the outcome of their shared control over the game. The 
following research questions are to be answered: 


1) 


2) 


3) 


4) 


Does public information about environmental change (“You are sharing 
control with humans!”) favor change of routine-strategy, when such new 
environmental conditions do not influence the routine-strategy’s performance? 
Does influence of environmental change (Middle rod is goal rod.) on routine- 
strategy’s performance favor change of routine-strategy? 

Will deviation distance from routine-strategy depend on the type of public 
information, i.e. information about man-made uncertainty will lead to higher 
deviation from routine-strategy than from unspecified uncertainty (no further 
public information)? 

Will public information about hidden rules favor overcoming parts of the 
routine-strategy? 
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5) Is group performance in the complex problem-solving game dependent on 
individual decision-making expertise in routine-strategy, when the routine- 
strategy statistically benefits the group’s performance in the game where no 
communication is possible? 


5.1 Derivation of Hypotheses 


The first research question of this thesis asks, whether information provided in G- 
IC, D-IC, R-IC and C-IC influences participants in changing their strategy, which 
they used to solve the ToH games during ToE games 8, 9, and 10. ToE games 
8, 9, and 10 can be solved with certainty in seven steps, when all game group 
agents stick to the framed logic, and can be solved with high probability in seven 
steps, when all game group agents mostly stick to the framed logic. Therefore, 
sticking with the framed logic during the first three ToE games will solve the 
GDM problem under uncertainty efficiently. As experimental results showed that 
individual strategies are mostly only altered by environmental change, if such 
change had an influence on the strategy’s performance, it was assumed that when 
participants had proven “framed logic” routine and ToH expertise, participants of 
such a game group will unlikely change their strategy. 

The participants’ “routine” was derived from the proportion of either “Framed 
Logic” or “No Border Logic” level-solving moves or actions. An action solving 
a ToH game is by definition always either solved via a “Framed Logic” or “No 
Border Logic” action, and can never be both. When neither a “Framed Logic” 
nor “No Border Logic” action solved a level, it was because the timer ran out. If 
a player listed more “Framed Logic” (F-L) or more “No Border Logic” (NB-L) 
values at actions, which solved a ToH level, the according logic was assumed to 
be the routine strategy. When a participant listed an equal proportion of F-L and 
NB-L actions that solved ToH levels, the routine strategy was regarded as unclear 
and therefore reported as “mixed” (Mx-L). 

ToH expertise of each individual was expressed by an index, and was the result 
of the participant performance measured during ToH games 2 to 7. 

ToH expert knowledge levels or ToH expertise was measured by looking at 
different parameters collected from ToH levels 2 to 7: 


— How many ToH levels were solved? 

Did participants solve at least one ToH game in 7 steps? 

— What was the least number of steps required in any ToH game? 
— How many ToH games were solved with 7 steps? 
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— How many steps in total were required to solve the ToH games? 


Ideally, if an agent solved all six ToH levels (excluding the first game) in 7 
steps using F-L, this agent would have proven the highest amount of expertise, 
and would have shown the F-L to be the routine-strategy. If an agent solved all 
ToH levels in 7 steps using NB-L, this agent would also have proven the highest 
amount of expertise, and would have shown the NB-L to be the routine-strategy. 

Expertise in F-L routine was expected to have the side-effect that game groups 
with high levels of F-L expert knowledge would solve the first three ToE levels 
with higher efficiency. The ToE rules were therefore not expected to affect strategy 
performance that stem from a framed strategy routine. ToE rules were however 
expected to influence strategy performance that stem from a no border strategy 
routine. Information about routine strategy and expertise levels was saved for each 
participant. Table 5.1 (own source) shows all data mentioned above by example, 
to express routine strategy used, and according expertise. 


Table 5.1 Results of z 
: Information Value 

example experiment for 

explanation, part 1 Strategy proportion F-L (6) NB-L (0) 
Routine strategy (F-L/NB-L/Mx-L) | F-L 
Number of failed ToH games 0 
Least number of steps required 7 
Number of 7-steps games 2 
Steps in total 60 


In order to create expertise categories, a pretest with 30 participants, all 
being US-American MTurks, was conducted, using the identical setup as being 
used in the main experiment. Three participants had idled throughout the 
entire experiment, and were not considered. The remaining 27 US-American 
MTurks’ results regarding routine strategy were used, and according information 
about expertise is summarized in table 5.2 (own source). 

Average values of strategy proportion regard 189 level-solving actions from 
1623 actions in total. From 189 level-actions by 27 participants, only 4 NB-L 
actions solved a game, performed by three distinct players. By definition all 27 
players were using F-L as their routine strategy. 

17 out of 27 participants completed all ToH games in time, not failing a single 
game. 8 out of 27 participants failed at one ToH game, by not completing it in 
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time. Two out of 27 participants failed at three ToH games, by not completing 
them in time. 

24 out of 27 participants completed at least one ToH game in 7 steps. Three 
out of 27 participants failed to solve at least one ToH game in 7 steps, having 
required 8, 9, and 10 steps during their best games. 


Table 5.2 Results of 


i Information (n = 27) Value (on average) 
example experiment for en 
explanation, part 2 Strategy proportion, average F-L (5.33) NB-L (0.15) 

Routine strategy all used F-L 


(F-L/NB-L/Mx-L) 


Number of failed ToH games, 0.59 
average 


Least number of steps required, | 7.22 
average 


Number of 7-steps games, 2.81 
average 


Steps in total, average, n = 17 = | 53.53 


Results regarding the amount of achieved 7-steps games are listed in the fol- 
lowing table 5.3 (own source), 15 out of 27 participants achieved between three 
and 6 perfect 7-steps ToH games. Only one out of 27 participants managed to 
solve all ToH games with 7 actions. 

When participants who failed to solve at least one ToH game are excluded, 
the remaining 17 participants required 53.53 actions in total to solve all 6 ToH 
games. It took the two participants who failed to compete at least one ToH game 
in 7 steps 65 and 72 steps to complete all stages. 

No participant who solved either 5 or 6 ToH games with 7 actions failed to 
solve a single ToH game. Only one participant who solved 4 ToH games with 7 
actions failed to solve at least one ToH game. Two participants who solved three 
ToH games with 7 actions failed to solve at least one ToH game. Five participants 
who solved two ToH games with 7 actions failed to solve at least one ToH game. 
Two participants who solved either none or just one ToH game with 7 actions 
failed to solve at least one ToH game. No participant who required 60 or more 
steps in total to solve all ToH games managed to solve more than two games with 
just 7 actions. 

From these results, three expertise categories are established using the number 
of ToH games solved in 7 steps and number of failed ToH games. The highest 
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Table 5.3 Results of 
example experiment for 
explanation, part 3 


Number of 7-steps ToH games | Number of agents (n = 27) 


Din! Pl] wWlwle|o 
el WI AJANIN w 


expert rank is assigned to participants, solving 4 or more ToH games with 7 
actions, not having failed more than one ToH game. The medium expert rank is 
assigned to participants solving two or three ToH games with 7 actions, not having 
failed more than 1 ToH game. The low expert rank is assigned to participants 
solving none or one ToH game with 7 actions. 

By this definition all 27 out of 27 participants were assigned the routine strat- 
egy “F-L”. 10 out of 27 participants from the pretest were assigned an expert rank 
of “high”, 10 out of 27 participants were assigned a “medium” expertise, and 7 
out of 27 participants were assigned “low” expertise. 

The 7 low expertise (L) participants collectively failed to solve 8 ToH games 
in total. The 10 medium expertise (M) participants collectively failed to solve 5 
ToH games in total. The 10 high expertise (H) participants collectively failed to 
solve only one ToH game. From these 27 participants only 15 produced valuable 
data, as 12 participants were either part of a bot-agent game group, disconnected 
or were part of a game group with players who idled throughout the single player 
phase. From these five game groups, two game groups were in the G-IC, two 
were in the D-IC, and one group in the R-IC. The according expertise levels are 
listed in table 5.5 (own source). 

From the small pretest alone, only 50 % of data could be used for analysis. 
Therefore, a rather large number of participants was expected to be required for 
the main experiment. It was estimated that for more than 180 game groups per 
condition, about 6000 human agents were required. Accordingly, 300 participants 
were expected to produce 10 game groups per condition. Even with 6000 human 
agents, analyzes would have still been limited by many factors, being discussed 
in chapter 5. 

As participants will be assigned to a bot agent after 5 minutes of waiting 
time due to ethical reasons, a game group that contained a bot-agent and was 
part of any other information condition other than N-IC was considered as a 
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“deception” condition. This is because participants of such game groups were 
informed about playing with “human agents”. Therefore, game groups having bot- 
agents in any information condition other than N-IC were considered “deceptive” 
and were excluded fully from data analysis. In order to enhance chances of filling 
game groups with human agents, the main experiment was divided into several 
parts, with each part collecting US-American MTurks at different day times. 

Due to the pretest results group expertise levels were expected to be mixed; 
from five game groups, four showed distinct levels of group expertise. Ten diffe- 
rent group expertise levels are possible, rated as “1” for “L, L, L” and “10” for 
“H, H, H”. Group expertise was expected to be normally distributed, confirming 
experimental studies that while repetition leads to better strategy use, each par- 
ticipant differs greatly in their individual ability to learn ToH rules (Janssen, De 
Mey, Egger, & Witteman, 2010). 

The group expertise level is calculated by individual expertise levels. The 
order, by which the expertise group ratings are listed, favors “group quality over 
individual quality”. In other words, a group consisting of 2 L experts and 1 H 
expert ranks lower in group expertise than a group with 1 L expert and 2 M 
experts. Table 5.4 shows all possible group expertise rankings resulting from 
individual expertise. 


Table 5.4 Group expertise 
rated as an integer in order 
from individual expertise 
levels. Source own source L,L,L 


L, L, M 
L, L, H 
L, M, M 
L, M, H 
M, M, M 
M, M, H 
L, H, H 
M, H, H 
H, H, H 


Individual expertise levels | Resulting group expertise 
in group 


OIil IAJ Hi WwW] w]e 
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Another order of preference that should be noted was L, H, H (8) over M, M, 
H (7). From a set theoretical viewpoint L, H was preferred over M, M. However, 
M, M, M (6) was preferred over L, M, H (5), where in this context M, M was 
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preferred over L, H. Therefore, from a purely logical viewpoint, a contradiction 
exits. The reason why M, M, M was preferred over L, M, H is because of con- 
sistency of skill in this group, as one single participant, who behaved “less than 
wise” was able to derail an entire group strategy. This might seem to be a weak 
argument then for the preference of group expertise 8 over 7, however, to acquire 
expertise level H requires very high precision in ToH decision-making. A group 
with expertise level 8 consists of two highly skilled experts, rendering the possi- 
bility of “less than wise behavior” of one single participant less likely. Of course, 
the order of group expertise still is debatable, but thorough thought was certainly 
put into its creation. 


Table 5.5 cs of Participant IDs Expertise Levels IC 
example experiment for OOS oO 
explanation, part 4 4,5, 6 M, M, L (4) D-IC 

7, 8,9 H, L, L (3) R-IC 

16, 17, 18 M, H, H (9) G-IC 

31, 32, 33 L, M, H (5) G-IC 

34, 35, 36 L, H, M (5) D-IC 


Coming back to the first research question, several variables were identified. 
Public information is either lacking in the N-IC or comes in four distinct forms 
in the G-IC, D-IC, R-IC or C-IC. Environmental conditions are all such circum- 
stances that lie outside of the agent’s control. Interpretations are not regarded as 
being part of the environmental conditions, even when “wrong interpretations” 
are facilitated by environmental conditions, as explained by two examples: as 
participants of the N-IC are not informed about playing with other agents, it is 
expected that participants of the N-IC interpreted outcomes that deviated from 
their expectation stemming from “error”, such as software bugs, glitches, rando- 
mizing variables, wrong inputs, and not due to human influences; as participants 
of the G-IC are not informed about there being next to chance of obtaining the 
true hidden rules, it is expected that participants of the G-IC interpreted outcomes 
that deviated from their expectation stemming from “error”, such as bad expertise 
of co-agents, human mistakes or “bad cognitive skill” by co-agents. 

The distinct information in each IC are considered public information and 
being part of the environmental conditions, however, their interpretations are con- 
sidered as being in control of each agent. Therefore, “public information about 
environmental change” is part of environmental conditions, lying outside of the 
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agent’s control, whereas their interpretation and ultimately their impact on the 
individual’s behavior is considered to be part of each agent’s control. 

Change stemming from environmental conditions are considered as being 
interpreted either as environmental or social influences. Environmental influences 
were defined as all influences which are not “man-made”. Social influences were 
defined as all influences which are “man-made”. It was expected that environmen- 
tal influence interpretations (EI-I) led to participants trying to maximize control 
over expected outcomes by sticking their routine strategy. It was expected that 
social influence interpretations (SI-I) led to participants trying to maximize con- 
trol over expected outcomes by deviating from their routine strategy. The fluent 
transition from deviation distances stemming from EI-I and SI-I are explained by 
listing all information conditions. 

In the N-IC participants were expected to interpret deviations from expec- 
ted outcomes mostly stemming from environmental influences, as the N-IC 
participants were not informed about there being human co-agents. 

In the G-IC participants were expected to interpret deviations from expected 
outcomes mostly stemming from social influences, as the G-IC participants were 
not informed implicitly that no agent was able to “outsmart” the hidden rules, 
other than by sticking to the regular single player rules. 

In the D-IC participants were expected to interpret deviations from expected 
outcomes stemming less from social influences than in the G-IC, as the D-IC par- 
ticipants were implicitly informed that all agents were “still putting their trousers 
on one leg at a time” and that looking for “patterns” to “outsmart” the hidden 
ruleset was a waste of time. D-IC participants were expected to interpret devia- 
tions from expected outcomes stemming less from environmental influences than 
in the N-IC, as D-IC participants still knew that they had “some control” over the 
outcomes, and in fact they did. 

The algorithm was written in such a way that each participant always had the 
chance of decisive impact on the group action’s outcome, and always had some 
impact on the group action’s outcome, while never having a chance of full control 
over the outcome, as order of inputs were decisive. Even if the entire algorithm 
was known, communication would be required in order to synchronize order of 
inputs with other co-agents, to obtain full control over the group action’s outcome. 
Although not entirely impossible, this thesis expects no game group to optimize 
control over game group outcomes. When the goal rod was the right rod, a game 
group could only “seemingly” optimize game group output. When the right rod 
was set to be the goal rod, a game group could solve ToE in 7 steps, with each 
individual agents sticking to the F-L, disregarding order of inputs. This was not 
the case when the goal rod was the center rod. When the center rod was the goal 
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rod, by F-L the optimal move was “S1”, with “S1, S1, S1” resulting in the small 
disc. The only realistic way of doing so without communication was if a game 
group stuck to a certain “rhythm”, meaning that order of information was stable, 
and at least one participant provided an input outside of F-L at the right moment. 
It was expected that such a dynamic decision-making equilibrium would not be 
observed. 

In the R-IC participants were expected to behave similar to G-IC participants, 
if R-IC participants (mostly) did still use directional buttons; should R-IC partici- 
pants (mostly) refrain from using the directional buttons, then greater deviations 
than in the G-IC were expected. As the environmental condition “The directional 
buttons do not influence the game at all” will never influence any strategy per- 
formance, some participants in the R-IC were expected to still use the directional 
buttons due to routine strength. In other words, routine strength of pressing direc- 
tional buttons was considered to dominate deviations from routine logic in some 
cases. Due to routine strength it was expected that participants who refrain from 
using the directional buttons, were still using them in some cases. R-IC partici- 
pants were expected to deviate more from their routine strategy than N-IC, and 
more than D-IC, due to SI-I. 

In the C-IC participants were expected to behave similar to D-IG participants 
when directional buttons (mostly) were used, and greater deviations were expected 
when directional buttons (mostly) were not used. C-IC participants were expected 
to deviate more from their routine strategy than N-IC participants, less than G-IC 
participants, and less than R-IC participants. 

In order to formulate the according hypotheses, deviation distance from routine 
strategy has to be defined and expressed by an index in the following. For now 
all mentioned expected deviation distances (dd) in each condition are ordered as 
follows: 


dd(N-IC) < dd(D-IC) <= dd(C-IC)<dd(G-IC)<= dd(R-IC). 


Therefore, the greatest deviations from routine strategy were expected in the 
R-IC and the least deviations from routine strategy were expected in the N-IC 
conditions. 

The greatest expected distance from routine strategy using the pretest was 
expected to be observed between D-IC and R-IC, as no N-IC data was available. 
In order to create the deviation distance from the routine strategy, several steps 
have to be taken. This is to be explained by example of the pretest again, using 
data of two game groups. 
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First the proportion of ToH routine logic actions to the total amount of ToH 
actions were measured in two ways, being “ToH total” and ToH parts”. A ToH 
total index of e.g. 0.6842 with ToH routine F-L means that ToH games’ actions 
from level 2 to 7 were in 68.42 % of the cases F-L actions. A ToH parts index of 
“1,0 / 0,5” means that ToH games’ actions from level 2 to 4 were in 100 % of the 
cases F-L actions and from level 5 to 7 were in 0.5 % of the cases F-L actions. 

Since in ToH game 5 the goal rod changed from being the right rod to being 
the center rod, most players failed to solve ToH game 5 as efficiently as ToH 
game 4, as players would use their level 4 strategy to begin level 5 with actions 
that deviate from the ideal path. The position of the goal rod was considered being 
a change of environmental conditions which affects a participant’s former routine 
strategy. Therefore, F-L has sub-routine strategies regarding the position of the 
goal rod. This effect was also expected in ToE games, since in game 4 the goal 
rod changed from being the right rod to being the center rod. 

The highest proportion of ToH logical actions was achieved by participant 7, 
who was ranked with high expertise. Lowest ToH logic proportion was achieved 
by participant 8, who was ranked with low expertise. It was expected that expertise 
rank and ToH total were to correlate, leading to the first two hypotheses. All 
hypotheses, dependent and independent variables are to be listed in the following 
sub-chapter. 


5.2 Hypotheses and Variables 


As expertise and logic proportions were expected to correlate, and goal rod change 
was expected to influence performance, the first two hypotheses are as follows: 

Hypothesis 1: The higher the individual expertise rank, the higher the logic 
proportion “ToH total” is. 

Hypothesis 2: Change of goal rod during ToH and ToE games in the 4" level 
leads to the first actions in the same level deviating from the ideal path. 

As can be seen in table 5.6, all participants in the D-IC conditions stuck closely 
to their routine strategy’s logic during ToE levels one to three, obtaining logic 
proportion levels of 95.24 %, 100 % and 100%. Even though participants were 
facing environmental change, this change did not influence the routine strategy’s 
performance. As expected, the participants did therefore not deviate from their 
routine strategy at all (participants 5 and 6) or not nearly at all (participant 4). It 
is expected that participants of the N-IC will show significantly lower values of 
routine logic deviation than D-IC, leading to the third hypothesis: 
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Hypothesis 3: Participants in the N-IC condition show the highest logic pro- 
portions in ToE levels one to three, expressed by “ToE parts 1”, followed by 
proportions of D-IC participants, then C-IC, G-IC and R-IC. 

As expected, routine logic deviations in the R-IC condition were higher than 
in the D-IC condition. While playing ToE participants 4, 5 and 6 followed their 
routine logic in 76.27 %, 86.44 % and 74.58 % of all cases, and participants 7, 
8, and 9 followed their routine logic in only 23.57 %, 24.29 % and 22.86 % of 
all cases. N-IC participants were expected to show even higher values in logic 
proportion than D-IC participants. This leads to the fourth hypothesis: 

Hypothesis 4: Participants in the N-IC condition show the highest total ToE 
logic proportion values, followed by proportions of D-IC participants, then C-IC, 
G-IC and R-IC. 

By example of the small sample sized pretest, routine logic deviations grew 
in the D-IC condition, which was expected, due to the change of the goal rod 
position influencing strategy performance. However, the goal rod change during 
ToE games has to be treated differently from the goal rod change during ToH 
games. During ToH games the goal rod change will influence performance due to 
participants e.g. not paying attention to such change, using their F-L logic which 
would be ideal when the goal rod is “right” not “center”. This loss in performance 
can be quickly corrected during ToH by becoming aware of the goal rod change 
and adapting the F-L to the new goal rod position. It was expected that participants 
who deviate from the ideal path in ToH level 5, but performed well during ToH 
level 4, will either keep on “trembling” throughout ToH levels 5 to 7 where the 
goal rod was changed to being “center” or quickly learn and adapt their F-L to the 
new ToH goal rod conditions. However, ToH games participants are not expected 
to be “surprised” be their actions’ output, measured in “expected states” deviation. 
Goal rod change in ToE level 4 on the other hand also influences the participants’ 
expected states deviation, as for example an individual action input of “S1 r” 
might result in the small disk “seemingly” travelling to the left or might even 
result in the medium or large sized disk being moved; such cases are expected to 
create an “expected states deviation”. Such expected states deviation can lead to 
new interpretation of each individual agent. The influence of the environmental 
condition “goal rod change” in ToE level 4 is expected to be of lower influence 
to ToE routine logic deviations in levels 4, 5, and 6 than the “expected states 
deviation” experience. In order to measure this, ToE parts | logic deviations are 
also considered, where no goal rod change is yet performed. It was expected that 
“expected states deviation” is a better predictor of ToE logic deviation than the 
environmental condition “goal rod change”, as the former is expected to lead to 
interpretation changes, inducing deeper uncertainty than by the latter. Therefore, 
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expected states deviations are expected to influence ToE logic deviations in all 
conditions of the experiment. In addition, higher expected states deviations were 
considered to lead to individual behavior which increasingly is not “captured” by 
any logic category, leading to low “logic marker” values. The logic marker reports 
the amount of actions in ToE games that are “0” in any logic category divided by 
total amount of actions. In other words, high values of expected states deviations 
were expected to make participants behave “randomly” from the perspective of 
the experimenter. 

Hypothesis 5: The higher expected states deviation proportion values with 
respect to routine strategy during all ToE conditions, the higher logic deviation 
proportion values are. 

Hypothesis 6: The higher expected states deviation proportion values with 
respect to routine strategy during all ToE conditions, the lower the logic marker 
proportion values are. 


Table 5.6 Results of example experiment for explanation, part 5 


ID |exp | ToH routine | ToE strategy | condition | ToH total (parts) | ToE total 
(parts 1 / 2) 
4 |M |F-L F-L dir D-IC 0,6842 (1,0 / 0,5) | 0,7627 (0,9524 / 
0,6579) 
5 |M |F-L F-L dir D-IC 0,6111 (0,56 / 0,8644 (1,0 / 
0,6552) 0,7895) 
6 IL F-L F-L dir D-IC 0,5 (0,5227 / 0,7458 (1,0 / 
0,4737) 0,6053) 
7 |H |FL F-L nodir R-IC 0,7037 (0,6071 / | 0,2357 (0,1806 / 
0,8077) 0,2941) 
8 JIL F-L F-L nodir R-IC 0,4386 (0,2692 / | 0,2429 (0,2361 / 
0,5806) 0,25) 
9 IL F-L F-L nodir R-IC 0,6250 (0,6757 / | 0,2286 (0,1806 / 
0,5714) 0,2794) 


R-IC and C-IC are expected to create higher expected states deviation from 


routine strategy as these conditions “take away” the basis for reinforcing the rou- 
tine strategy, i.e. by informing about the “uselessness” of the direction button. It 
was expected that during R-IC expected states deviation values with respect to 
routine strategy were higher during ToE parts 1 than in all other conditions. 
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As the logic deviation distance of the G-IC was expected to be higher than of 
the C-IC, but the expected states distance of the C-IC was expected to be hig- 
her than of the G-IC, ToE game group performance, measured in total amount 
of required steps to solve all six ToE games, are considered, to indicate, whether 
logic deviation or expected states deviation with respect to routine logic is a better 
predictor of ToE group performance. Expected states deviation can be conside- 
red as a measurement of “irritating” feedback when a certain logic is used and 
was considered to lead to fundamental interpretation changes. Expected states 
deviation is the result of action. Logic deviation on the other hand expresses 
already performed action, embedding some former expectation. High expected 
states deviation distance with respect to some logic is considered as “more ran- 
dom feedback”. By Hypothesis 5 and 6 this was considered to lead to higher logic 
deviation distances, and seemingly random behavior. R-IC was expected to induce 
radical interpretation problems, inducing participants to feel uncertain about their 
routine strategy. G-IC was expected to induce uncertainty by social influence, 
where participants would try to adapt their strategy according to certain “pat- 
terns”, ultimately adapting their strategy. In G-IC participants were expected to 
use different forms of logic, not just their routine logic, therefore both using F-L 
and NB-L, leading to a lower proportion of routine logic used than in the C-IC 
condition, as only one logic form can be the routine logic. 

Routine consistency is the number of routine strategy actions during the ill- 
defined stages that fall either into the F-L or NB-L category, divided by the total 
amount of actions during the ill-defined stages; sub-distinguishing elements of 
logic forms such as dir, nodir and ideal are disregarded for the calculation of 
routine consistency. When an action falls neither into the F-L or NB-L category, 
this action still is added to the total amount of actions, by which the number 
of routine strategy actions during the ill-defined stages is divided. Actions that 
fall outside of any known logic category are measured by the logic marker. For 
instance, a player has developed routine logic F-L from the well-defined stages. 
He has used 100 actions total during the ill-defined stages, with 90 F-L actions 
(80 times dir, 5 times nodir, 5 times ideal) and 10 NB-L actions (4 times dir, 2 
times nodir, 4 times ideal), and therefore has a logic marker of 0 (0.00 %), since 
all actions are part of known logic categories. The resulting routine consistency 
is 0.90 (90 %). 

Low routine consistency in G-IC ultimately was expected to lead to greater 
logic deviation distance from routine logic than in the C-IC condition, and due to 
logic volatility, to also lead to a higher deviation of expected states with respect 
to the routine logic. In the C-IC condition participants were expected to “stick 
with one logic” as they were “discouraged” by dissolution, still being induced by 
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a lowered form of social influence and interpretation uncertainty. The D-IC lacks 
the interpretation uncertainty regarding the direction buttons, and comes with a 
lowered form of social influence. In other words, participants in the R-IC were 
expected to use different kinds of logic forms or strategies, and are induced with 
deep uncertainty with all strategies they tried, perhaps even leading to participants 
actually performing actions arbitrarily. Participants in the G-IC were expected to 
use different kinds of strategies, without being induced with deep uncertainty. 
Participants in C-IC were induced with deep uncertainty, however, were expected 
to be less volatile in their strategy forming than in G-IC, still deviating more from 
their routine strategy than in D-IC. 

Hypothesis 7: Expected states deviation proportion values during ToE parts 1, 
ToE parts 2 and ToE total in R-IC are the highest, followed by G-IC, C-IC, D-IC 
and lastly N-IC. 

Hypothesis 8: Routine consistency is the lowest in R-IC, followed by G-IC, 
C-IC, D-IC and N-IC. 

Group performance, measured in numbers of group actions required to solve 
all ToE games, depends on the order of group actions. The algorithm is imple- 
mented in such a way that when all participants of a game-group at least agree 
on the optimal disk to be moved, this collectively chosen disk will always be 
moved, and the game group will outperform randomness greatly, even with dif- 
ferent strategies in mind on how to move the disk. However, it was expected that 
even this “fundamental logic” will be dissolved with inducing deep uncertainty 
by telling participants the truth about “the direction buttons not working”. It was 
expected that the proportion of actions where participants did agree on one disk, 
disregarding whether it was the optimal choice, was the best predictor for group 
performance, expressed by the “fundamental index”. 

Hypothesis 9: The lower the fundamental index the lower game group 
performance. 

Finally, it was expected that group expertise rank explains inter-condition logic 
deviations amongst groups. 

Hypothesis 10: Lower inter-condition group expertise rankings lead to lower 
logic deviation proportions. 

In the following, table 5.7 (own source) will list all dependent and independent 
variables required for all 10 hypotheses and their according hypothesis (H). 
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Table 5.7 Independent and dependent variables, with according hypothesis 

Independent variable Dependent variable H 

ind. expertise rank logic proportion “ToH total” 1 

goal rod change, level 4 | starting routine logic values of 0 2 
in ToH and ToE, all conditions 

public information logic proportion “ToE parts 1” in order 3 
N-IC>D-IC>C-IC>G-IC>R-IC 

public information logic proportion “ToE total” in order 4 
N-IC>D-IC>C-IC>G-IC>R-IC 

expected state proportion | logic proportion 5 
during all ToE conditions 

expected state proportion | logic marker proportion 6 
during all ToE conditions 

public information expected state proportion ToE total, ToE parts 1 and ToE | 7 
parts 2 
R-IC < G-IC < C-IC < D-IC < N-IC 

public information strategy consistancy index 8 
R-IC<G-IC<C-IC<D-IC <N-IC 

fundamental index game group performance 9 
during all ToE conditions 

group expertise rank logic proportion ToE total, ToE parts 1 and ToE parts 2 10 
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Results 6 


An attempt to conduct an experiment with 330 US-American MTurks failed due 
to server memory capacity with the Amazon AWS “t2.micro 1 GiB”. After upgra- 
ding the server to 32 GiB of working memory with the Amazon AWS “t2.2xlarge 
32 GiB”, an experiment with 180 US-American MTurks was conducted, from 
which data of 87 participants was used. As estimated, more than 50 % of par- 
ticipant data was lost due to connection errors, incorrect raw data, participants 
leaving the experiment or participants playing in a game group with one or more 
bots. CPU capacity reached 55 % during the experiment, and it is not advised to 
try larger numbers of participants with mentioned settings. 

29 female and 58 male participants aged 33.16 years on average were ana- 
lyzed. From 87 participants 9 reported having conducted the experiment before, 
63 reported not having conducted the experiment before, while 15 participants 
did not answer to this question. By comparing MTurk ID tables all 9 parti- 
cipants, who reported having conducted the experiment before, were part of 
the 330-participants experiment, which crashed before the ToE stages were rea- 
ched. Therefore, all participants were included. Participants from the example 
experiment mentioned before were not included. 

Chapter 6 will analyze all hypotheses in according sub-chapters, beginning 
with testing variables for parametric or nonparametric distribution. 


6.1 Testing For Nonparametric Distribution 


Variables were tested for nonparametric distribution using the One-Sample 
Kolmogorov-Smirnov Test. Each null hypothesis stating that the variable was dis- 
tributed normally was rejected for 11 variables with high significance, being listed 


© The Author(s) 2021 129 
U. G. Strunz, The Impact of Individual Expertise and Public 

Information on Group Decision-Making, FOM-Edition Research, 
https://doi.org/10.1007/978-3-658-33139-9_6 


130 6 Results 


Hypothesis Test Summary 
Null Hypothesis 
One-Sample 


Chi-Square 
Test 


The categories of expertise occur 
with equal probabilities. 


The distribution of . 
strategy_volatility_marker1 is 
normal with mean 1 and standard 


One-Sample 
Kolmogorov- 


deviation 0,222. Smimov Test 


The distribution of ToH_tot is normal One-Sample 
with mean 0.6989 and standard Kolmogorov- 
deviation 0,258. Smirnov Test 


The distribution of ToH_parts1 is One-Sample 
normal with mean 0.7323 and Kolmogorov- 
standard deviation 0,267. Smirnov Test 


The distribution of ToH_parts2 is One-Sample 
normal with mean 0.6833 and Kolmogorov- 
standard deviation 0,290. Smirnov Test 


The distribution of ToE_tot is normal One-Sample 
with mean 0.7410 and standard Kolmogorov- 
deviation 0,224. Smirnov Test 


The distribution of ToE parts! is One-Sample 
normal with mean 0.7583 and Kolmogorov- 
standard deviation 0,255. Smirnov Test 


The distribution of ToE_parts2 is One-Sample 
normal with mean 0.7338 and Kolmogorov- 
standard deviation 0,240. Smirnov Test 


The distribution of ToE_X_tot is One-Sample 
normal with mean 0.4864 and Kolmogorov- 
standard deviation 0,225. Smirnov Test 


The distribution of ToE_X_parts1 is One-Sample 
normal with mean 0.6013 and Kolmogorov- 
standard deviation 0,251. Smirnov Test 


The distribution of ToE_X_parts2is One-Sample 
normal with mean 0.4245 and Kolmogorov- 
standard deviation 0,224. Smirnov Test 


The distribution of logic_marker is One-Sample 
normal with mean 0.0796 and Kolmogorov- 
standard deviation 0,099. Smirnov Test 


Asymptotic significances are displayed. The significance level is 05. 


‘Lilliefors Corrected 


Figure 6.1 Test results for nonparametric distribution of variables. Source own source 
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in figure 6.1. Distribution of individual expertise occurring with equal probability 
was rejected at the 0.01 level of significance. The list includes variables for indi- 
vidual expertise, routine consistency, all logic proportions from the well-defined 
stages, all logic proportions from the ill-defined stages, all expected states from 
the ill-defined stages and the logic marker index. Using Shapiro-Wilk testing, 
all 12 null hypotheses stating normal/parametric for the same variable distri- 
butions were rejected with very high significance (p = 0.000). For this reason, 
distributions are considered being nonparametric, and therefore, with exception 
of Hypotheses 2, nonparametric analyses are used. 


6.2 Expertise Rank and Logic Proportion 


Hypothesis 1: The higher the individual expertise rank, the higher the logic 
proportion “ToH total” is. 

Individual expertise rank was categorized either being „low“, „medium“ or 
„high“. Agents who failed more than one ToH game due to the timer running 
out were always part of the “low” expertise rank. Agents who completed 4 or 
more ToH games in 7 steps were part of the “high” expertise category. Agents 
who completed two or three ToH games in 7 steps were part of the “medium” 
category. Agents who completed one or no ToH game in 7 steps were part of the 
“low” category. 

33 agents were part of the „low“ expertise group, 16 agens were part of the 
„medium“ expertise group and 38 agents were part of the „high“ expertise group. 

» 10H total“ is the proportion of ideal routine strategy steps used in all ToH 
games, with exception of the first game. Spearman’s rho showed a correlation 
significant at the 0.01 level (2 tailed), as shown in figure 6.2. 

Agents with low ToH expertise (0) had a mean index of 0.4463 ToH total (std. 
error 0.0338, std. deviation 0.1943). Agents with medium expertise (1) had a mean 
index of 0.7231 ToH total (std. error 0.03646, std. deviation 0.1458). Agents with 
high expertise (2) had a mean index of 0.9080 ToH total (std. error 0.0172, std. 
deviation 0.1063). Figure 6.3 shows specifics as a box-plot diagram. 

Kruskal-Wallis H shows group differences in ToH total index by ToH expertise 
to be highly significant (H(33, 16, 38), H = 60.604, p = 0.000). 

Hypothesis 1 is therefore confirmed. Differences in routine logic deviation 
correlate significantly with the ToH total index and differences are significant. 
Means of high expertise participants and low/medium expertise participants vary 
significantly in terms of logic proportion “ToH_total”. 
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Correlations 


expertise ToH_tot 


Spearman'srho expertise Correlation Coefficient 1,000 839" 
Sig. (2-tailed) ‘ ,000 

N 87 87 

ToH_tot Correlation Coefficient 839" 1,000 

Sig. (2-tailed) ,000 ; 

N 87 87 


™ Correlation is significant at the 0.01 level (2-tailed). 


Figure 6.2 Correlation results of expertise and ideal routine strategy in “well-defined” 
stages. Source own source 
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Figure 6.3 Boxplot results of expertise levels and logic proportion during „well-defined“ 
stages. Source own source 


6.3 Environmental Change and Human Error 


Hypothesis 2: Change of goal rod during ToH and ToE games in the 4th level 
leads to the first actions in the same level deviating from the ideal path. 
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In order to confirm or not confirm this hypothesis, all first actions of all six 
ToH games were analyzed, whether or not this first move was an “ideal” move by 
F-L. This analysis excludes NB-L, as not a single ToE game was started by any 
of the 87 participants via an ideal NB-L move. The hypothesis was not analyzed 
for ToE games as too many factors influenced individual behavior aside from the 
goal rod change, making a statistical analysis questionable. The hypothesis was 
then modified to: 

Hypothesis 2: Change of goal rod during ToH games in the 4th level leads to 
the first actions in the same level deviating from the ideal path. 

As shown in table 6.1 (own source), not ideal first moves from ToH games one 
to three sunk from 45,98 % (n = 87) to 30,59 % (n = 85). With the introduction 
of goal rod change in ToH game 4, the not ideal first move proportion had risen 
to 51,16 % (n = 86), even being higher than the initial “mistake” proportion. 

Mean average proportion of not ideal first moves of 0.4180 (std. deviation 
0.0685, std. error 0.0278) differs significantly from 0.5116 (51.16 %) with p = 
0.020. Mean average proportion of not ideal first moves do not significantly differ 
from the second highest value 0.4598 (45.98 %) with p = 0.195. 

Modified Hypothesis 2 is therefore confirmed. Mistake rates on the first action 
in game 4, where the goal rod was changed, differed significantly from mean 
average mistake proportion. 


Table 6.1 Impact of “macrostructure shift” on decision-making performance. Source own 
source 


ToH game 1 | ToH game 2 | ToH game 3 | ToH game 4 | ToH game 5 | ToH 

(goal rod game 6 
change) 

not 40 36 26 44 35 34 

ideal 

ideal | 47 50 59 42 50 51 

total 87 86 85 86 85 85 

rel. not | 0,459770115 | 0,418604651 | 0,305882 0,511628 0,411765 0,4 

ideal 

rel. 0,540229885 | 0,581395349 | 0,694118 0,488372 0,588235 0,6 

ideal 
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6.4 Information Conditions and Logic Deviation 


Hypothesis 3: Participants in the N-IC condition show the highest logic pro- 
portions in ToE levels one to three, expressed by “ToE parts 1”, followed by 
proportions of D-IC participants, then C-IC, G-IC and R-IC. 

Logic proportion is an index representing the proportion of actions being rou- 
tine logic actions. The lower the index is, the higher the agent deviated from 
its routine strategy. The index ,,ToE parts 1“ refers to the first three ToE games, 
which could be solved in 7 steps by sticking to the framed logic. The anticipated 
order by hypothesis 3 was: N-IC>D-IC>C-IC>G-IC>R-IC. 

18 agents were part of the N-IC condition (6 groups), 24 agents were part 
of the G-IC condition (8 groups), 15 agents were part of the D-IC condition, 15 
agents were part of the R-IC condition (5 groups) and 15 agents were part of the 
C-IC condition (5 groups). This was true for all hypotheses. 

Mean average ToE parts 1 index of the N-IC was 0.7113 (std. error 0.6772, 
std. deviation 0.2873), with a range of 0.8. Mean average ToE parts | index of the 
G-IC was 0.7596 (std. error 0.0580, std. deviation 0.2841), with a range of 0.75. 
Mean average ToE parts 1 index of the D-IC was 0.6429 (std. error 0.0689, std. 
deviation 0.2666), with a range of 0.8. Mean average ToE parts 1 index of the 
R-IC was 0.9179 (std. error 0.0508, std. deviation 0.1966), with a range of 0.36. 
Mean average ToE parts 1 index of the C-IC was 0.7685 (std. error 0.0508, std. 
deviation 0.1966), with a range of 0.3636. Figure 6.4 shows the box-plot data. 

Kruskal-Wallis H shows significant differences between conditions regarding 
the ToE parts 1 index, with (H(18, 24, 15, 15, 15), H = 10.119, p = 0.038). 

Hypothesis 3 cannot be confirmed. The observed order of ToE parts 1 by 
information condition is R-IC>C-IC>G-IC>N-IC>D-IC, while the conditions‘ 
differences by this index were measured to be significant. The ,,routine informa- 
tion condition“ shows the lowest routine logic deviation, while the ,,dissolution 
information condition“ shows the highest routine logic deviation during the first 
three ToE games. 


6.5 Complete Logic Proportions Over Information 
Conditions 


Hypothesis 4: Participants in the N-IC condition show the highest total ToE logic 
proportion values, followed by proportions of D-IC participants, then C-IC, G-IC 
and R-IC. 
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Figure 6.4 Boxplot results of logic proportion during “metastable” conditions over all infor- 
mation conditions: 0 = N-IC, 1 = G-IC, 2 = D-IC, 3 = R-IC, 4 = C-IC. Source own 
source 


The index ,,ToE parts total“ refers to the all six ToE games. The anticipated 
order by hypothesis 4 was: N-IC>D-IC>C-IC>G-IC>R-IC. 

Mean average ToE total index of the N-IC was 0.7000 (std. error 0.0551, std. 
deviation 0.2339), with a range of 0.7218. Mean average ToE total index of the G- 
IC was 0.7409 (std. error 0.0580, std. deviation 0.2841), with a range of 0.6923. 
Mean average ToE total index of the D-IC was 0.7148 (std. error 0.0611, std. 
deviation 0.2366), with a range of 0.6768. Mean average ToE total index of the 
R-IC was 0.7970 (std. error 0.0475, std. deviation 0.1839), with a range of 0.5584 
Mean average ToE total index of the C-IC was 0.7609 (std. error 0.0546, std. 
deviation 0.2114), with a range of 0.6205. Figure 6.5 shows the box-plot data. 

Kruskal-Wallis H shows no significant differences between conditions regar- 
ding the ToE total index, with (H(18, 24, 15, 15, 15), H = 2,408, p = 
0.661). 

Hypothesis 4 cannot be confirmed. The observed order of ToE total by 
information condition is R-IC>C-IC>G-IC>N-IC>D-IC, while the conditions‘ 
differences by this index were not significant. The ,,routine information condi- 
tion“ shows the lowest routine logic deviation, while the ,,dissolution information 
condition“ shows the highest routine logic deviation during the first three ToE 
games. However, the differences by this index were not significant. 
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Figure 6.5 Boxplot results of logic proportion during “ill-defined” conditions over all infor- 
mation conditions: 0 = N-IC, 1 = G-IC, 2 = D-IC, 3 = R-IC, 4 = C-IC. Source own 
source 


6.6 Expected States and Logic Proportion 


Hypothesis 5: The higher expected states proportion values with respect to routine 
strategy during all ToE conditions, the higher logic proportion values are. 

Expected states proportion is an index referring to the proportion of actions that 
were followed by the expected outcome, with respect to the actions’ routine logic. 
The higher the expected state proportion the lower the expected state deviation. 
The lower the expected state proportion the higher the expected state deviation. 
Hypothesis 5 therefore assumed low expected states proportion to correlate with 
low logic proportion values, and high expected state proportion to correlate with 
high logic proportion values. 

Just like logic proportion indexes there exist three expected states propor- 
tion indexes: “ToE_X_tot” refers to the expected states in all six ToE games. 
“ToE_X_partsl” refers to the expected states in the first three ToE games. 
“ToE_X_parts2” refers to the last three ToE games. All three expected states 
indexes were compared to all three logic proportion indexes, being ToE total, 
ToE parts 1 and ToE parts 2. 
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Spearman’s rho correlation was significant at the 0.001 level (2-tailed) between 
all expected states and logic proportion indexes. Figure 6.6 sums up all mentioned 
data. 

Hypothesis 5 was confirmed; expected states correlations with logic proportion 
were found to be highly significant. 


6.7 Expected States and Logic Marker Proportion 


Hypothesis 6: The higher expected states proportion values with respect to routine 
strategy during all ToE conditions, the lower the logic marker proportion values 
are. 

Logic marker is an index representing the proportion of ToE actions of an agent 
which were not “captured” by any logic index. From the perspective of this thesis’ 
model, such actions can be regarded as “random”. It was expected that the agents 
who experience many actions to be followed by their expected outcome, would 
stick to some logic being framed by the model. In other words, it was expected 
that agents who experience seemingly “random” outcomes would also behave 
randomly. The higher the logic marker index is, the more “random” the agents 
behaved. The lower the logic marker index, the more this thesis’ model can make 
sense of its behavior. Therefore, high expected states proportion was anticipated 
to lead to low logic marker values and therefore “less random behavior from the 
model’s perspective” (Figure 6.7). 

ToE_X_tot correlation with logic marker values was significant at the 0.01 
level (2-tailed). ToE_X_parts1 correlation with the logic marker values was signi- 
ficant at the 0.05 level (2-tailed). ToE_X_parts2 correlation with the logic marker 
values was significant at the 0.01 level (2-tailed). Figure 6.7 sums up the results. 

Hypothesis 6 was confirmed. All expected states indexes correlations with the 
logic marker index were either significant (p = 0.024) or highly significant (p = 
0.000). 


6.8 Complete Expected States Over Information 
Conditions 


Hypothesis 7: Expected states proportion values during ToE parts 1, ToE parts 2 
and ToE total in R-IC are the highest, followed by G-IC, C-IC, D-IC and lastly 
N-IC. 
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Correlations 
ToE_X_parts ToE_X_parts 
ToE_X_tot 1 2 logic_marker 
Spearman's rho ToE_X_tot Correlation Coefficient 1,000 951" 953" -448" 
Sig. (2-tailed) 000 ,000 ,000 
N 87 87 87 87 
ToE_X_parts1 Correlation Coefficient 851" 1,000 702" “241° 
Sig. (2-tailed) 000 000 024 
N 87 87 87 87 
ToE_X_parts2 Correlation Coefficient 953" 702" 1,000 -440" 
Sig. (2-tailed) ,000 ,000 000 
N 87 87 87 87 
logic_marker Correlation Coefficient «448° -241° -440" 1,000 
Sig. (2-tailed) 000 024 000 
N 87 87 87 87 


**. Correlation is significant at the 0.01 level (2-tailed) 
*. Correlation is significant at the 0.05 level (2-tailed). 


Figure6.7 Correlation results between expected states and logic marker. Source own source 


The anticipated order of expected states proportion values was: R-IC<G- 
IC<C-IC<D-IC<N-IC. 

Mean average ToE_X_total index of the N-IC was 0.4435 (std. error 0.0573, 
std. deviation 0.2431), with a range of 0.7358. Mean average ToE_X_total index 
of the G-IC was 0.5322 (std. error 0.0490, std. deviation 0.2401), with a range 
of 0.7407. Mean average ToE_X_total index of the D-IC was 0.5322 (std. error 
0.0490, std. deviation 0.2401), with a range of 0.7407. Mean average ToE_X_total 
index of the R-IC was 0.5171 (std. error 0.0486, std. deviation 0.1882), with a 
range of 0.68 Mean average ToE_X_total index of the C-IC was 0.4076 (std. 
error 0.0620, std. deviation 0.2401), with a range of 0.6552. Figure 6.8 shows the 
box-plot data. 

Differences by ToE_X_total in all five conditions were not significant accor- 
ding to Kruskal-Wallis-H: (H(18, 24, 15, 15, 15) = 4.766, p = 0.312). Neverthe- 
less, the observed order by this index was G-IC>R-IC>D-IC>N-IC>C-IC. 

Mean average ToE_X_parts1 index of the N-IC was 0.5374 (std. error 0.0672, 
std. deviation 0.2853), with a range of 0.8571. Mean average ToE_X_parts1 
index of the G-IC was 0.6620 (std. error 0.0537, std. deviation 0.2633), with 
a range of 0.8667. Mean average ToE_X_partsl index of the D-IC was 0.5730 
(std. error 0.0536, std. deviation 0.2075), with a range of 0.5826. Mean average 
ToE_X_parts1 index of the R-IC was 0.6983 (std. error 0.0464, std. deviation 
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Figure 6.8 Boxplot results of expected states during “ill-defined” stages over information 
conditions: 0 = N-IC, 1 = G-IC, 2 = D-IC, 3 = R-IC, 4 = C-IC. Source own source 


0.1797), with a range of 0.6750 Mean average ToE_X_parts1 index of the C-IC 
was 0.5119 (std. error 0.0666, std. deviation 0.2579), with a range of 0.6971. 

Differences by ToE_X_parts1 in all five conditions were found to be significant 
at the 0.1 level according to Kruskal-Wallis-H: (H(18, 24, 15, 15, 15) = 8.944, p 
= 0.063). The observed order by this index was R-IC>G-IC>D-IC>N-IC>C-IC. 

Mean average ToE_X_parts2 index of the N-IC was 0.3920 (std. error 0.0525, 
std. deviation 0.2227), with a range of 0.6774. Mean average ToE_X_parts2 index 
of the G-IC was 0.4667 (std. error 0.0455, std. deviation 0.2228), with a range of 
0.7. Mean average ToE_X_parts2 index of the D-IC was 0.4655 (std. error 0.0537, 
std. deviation 0.2078), with a range of 0.7. Mean average ToE_X_parts2 index of 
the R-IC was 0.4210 (std. error 0.0566, std. deviation 0.2192), with a range of 
0.7 Mean average ToE_X_parts2 index of the C-IC was 0.3585 (std. error 0.0653, 
std. deviation 0.2528), with a range of 0.6389. 

Differences by ToE_X_parts2 in all five conditions were found to be not signi- 
ficant according to Kruskal-Wallis-H: (H(18, 24, 15, 15, 15) = 3,874, p = 0.423). 
The observed order by this index was G-IC>D-IC>R-IC>N-IC>C-IC. 

Hypothesis 7 was not confirmed. Observed order by expected state propor- 
tion differed between ToE_X_total, ToE_X_parts1 and ToE_X_parts 2, while only 
ToE_X_parts1 differed between conditions with low significance (p = 0.063). 
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6.9 Routine Consistency 


Hypothesis 8: Routine consistency index is the lowest in R-IC, followed by G-IC, 
C-IC, D-IC and N-IC. 

The routine consistency is the proportion of all actions during the ill-defined 
stages falling into the routine logic category (either F-L or NB-L), where 
dir/nodir/ideal or not distinguished. Actions that do not fall into any category 
are added to the total amount of actions. The higher the routine consistency, the 
more actions by an agent fall into the routine strategy category. The lower the 
routine consistency the higher an agent’s routine volatility. Since it was anticipa- 
ted that agents would switch their strategy in the R-IC the most, this condition 
was anticipated to show the lowest routine consistency. The anticipated routine 
consistency order was N-IC>D-IC>C-IC>G-IC>R-IC. 

Mean average routine consistency of the N-IC was 0.6511 (std. error 0.0491, 
std. deviation 0.2081), with a range of 0.72. Mean average routine consistency of 
the G-IC was 0.7250 (std. error 0.0490, std. deviation 0.2400), with a range of 
0.69. Mean average routine consistency of the D-IC was 0.7140 (std. error 0.0615, 
std. deviation 0.2382), with a range of 0.68. Mean average routine consistency of 
the R-IC was 0.7853 (std. error 0.0533, std. deviation 0.2066), with a range of 
0.63 Mean average routine consistency of the C-IC was 0.7607 (std. error 0.0544, 
std. deviation 0.2108), with a range of 0.62. 

Differences by routine consistency in all five conditions were found to be not 
significant according to Kruskal-Wallis-H: (H(18, 24, 15, 15, 15) = 5.018, p = 
0.285). The observed order by this index was R-IC>C-IC>G-IC>D-IC>N-IC. 

Hypothesis 8 was not confirmed. The routine consistency did not differ 
significantly over all information conditions, and the observed order by routine 
consistency differed from what was anticipated. 


6.10 Fundamental Strategy and Group Performance 


Hypothesis 9: The lower the fundamental index the lower game group perfor- 
mance. 

The fundamental index shows the proportion of group decisions, where all 
agents agreed upon, which disk to move. The lower the proportion, the higher the 
number of steps were expected to, represented by the variable “performance_toe”. 
Again, “performance_toe” is the number of steps saved by a group solving all ToE 
games. However, if a game group failed to solve a ToE stage in time (3 minutes), 
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the number of steps saved does not represent the number of steps required to 
solve a ToE stage. 

If this was ignored, Spearman’s rho showed the correlation between the fun- 
damental index and the number of steps saved for a group attempting to solve all 
ToE games to be significant at the 0.01 level (2-tailed), with p = 0.002. Therefore, 
the lower the fundamental index was, the higher the variable “performance_toe”. 

However, the number of steps required to solve all ToE games is not represen- 
ted by “performance_toe”. For this reason, the “solved” variable was included, 
which marks group games, which were solved. However, the variable “solved” 
was unreliable, marking game group games which were not solved by action, but 
by failing to solve them in time. 

Therefore, hypothesis 9 was not confirmed. The lower the proportion of group 
actions, where all agents agreed upon which disk to move, the more steps it took 
to solve all ToE games, however, the number of steps required did not represent 
group performance. 


6.11 Group Expertise and Logic Proportions 


Hypothesis 10: Lower inter-condition group expertise rankings lead to lower logic 
deviations proportions. 

Group expertise is calculated by individual expertise levels of one game group 
(see table 5.4). It was assumed that group expertise correlates with group behavior 
and therefore impacts logic deviation. When information conditions are disregar- 
ded, group expertise seems to highly correlate positively with the proportion of 
routine strategy actions over all information conditions (N-IC, G-IC, D-IC, R- 
IC, C-IC) and all ill-defined system states (metastable, instable). The higher the 
deviation proportion index, the less an agent deviated from its routine from the 
well-defined stages. Group expertise correlated significantly and positively at the 
0.01 level with the deviation proportion index of all ill-defined stages (ToE tot, 
p = 0.001), with metastable ill-defined stages (ToE 1, p = 0.000) and correlated 
significantly and positively at the 0.05 level with the deviation proportion index 
of instable ill-defined stages (ToE 2, p = 0.028). To avoid confusion it should 
be noted again that this means that this analysis, on first sight, can be interpreted 
as: the higher the group expertise, the less the group deviates from its routine 
strategy, which was learned during the well-defined stages. 

However, these results were considering 87 individuals that are surrounded by 
the according group expertise. It is debatable whether or not these results are valid, 
as group expertise has to be considered to be the result of an entire group, which is 
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facing different information conditions. Therefore, the following analysis is more 
precise, considering groups as a whole and the according information conditions. 

In the N-IC condition, which held 18 participants amongst 6 game-groups, 9 
agents were part of a game-group with a group expertise of “3”. Three agents 
were part of a game-group with a group expertise of 5, of 7 and of 9 respectively. 
Kruskal-Wallis H (7.066) showed the difference of ToE total indexes amongst the 
game group expertise in N-IC to be of low significance, with p = 0.070. Spear- 
man’s rho measured the correlation between N-IC group expertise surrounding an 
agent, and the agent’s ToE total index to be significant at the 0.05 level (2-tailed), 
with p = 0.015. 

In the G-IC condition, which held 24 participants amongst 8 game-groups, 6 
agents were part of a game-group with group expertise of “1” and “9”. Three 
agents were part of a game-group of group expertise “2”, of “3”, of “8” and of 
“10” respectively. Kruskal-Wallis H (12.951) showed the difference of ToE total 
indexes amongst the game group expertise in G-IC to be significant, with p = 
0.024. Spearman’s rho measured the correlation between G-IC group expertise 
surrounding an agent, and the agent’s ToE total index to be significant at the 0.01 
level (2-tailed), with p = 0.009. 

In the D-IC condition, which held 15 participants amongst 5 game-groups, 
three agents were part of a game-group with group expertise of “1”, of “2”, of 
“5”, of “7” and of “8” respectively. Kruskal-Wallis H (11.387) showed the dif- 
ference of ToE total indexes amongst the game group expertise in D-IC to be 
significant, with p = 0.023. Spearman’s rho measured the correlation between 
D-IC group expertise surrounding an agent, and the agent’s ToE total index to be 
not significant, with p = 0.113. 

In the R-IC condition, which held 15 participants amongst 5 game-groups, 
three agents were part of a game-group with group expertise of “2”, of “5”, and of 
“9”, respectively. 6 agents were part of a game-group with group expertise of “10” 
Kruskal-Wallis H (8.221) showed the difference of ToE total indexes amongst the 
game group expertise in R-IC to be significant, with p = 0.042. Spearman’s rho 
measured the correlation between R-IC group expertise surrounding an agent, and 
the agent’s ToE total index to be not significant, with p = 0.209. 

In the C-IC, which held 15 participants amongst 5 game-groups, three agents 
were part of a game-group with group expertise of “8”. 6 agents were part of 
a game-group with group expertise of “3” and “9” respectively. Kruskal-Wallis 
H (2.663) showed the difference of ToE total indexes amongst the game group 
expertise in C-IC to not be significant, with p = 0.264. Spearman’s rho measu- 
red the correlation between C-IC group expertise surrounding an agent, and the 
agent’s ToE total index to be not significant, with p = 0.758. 
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Results for hypothesis were mixed, as N-IC and G-IC showed very significant 
relations between group expertise and logic deviation proportions, as well as solid 
differences regarding overall logic deviations. D-IC barely touched significance at 
the 0.1 level for correlation between group expertise and logic deviations, but has 
shown highly significant difference regarding overall logic deviation. R-IC and 
C-IC results showed no significant correlation between group expertise and logic 
deviation, but groups in R-IC differed significantly regarding overall logic devia- 
tion. The latter supports the hypothesis and shows the high context dependency, 
which is regarded as natural, due to the high complexity of this analysis. 

Hypothesis 10 cannot be clearly confirmed considering all details and can only 
be confirmed partially. However, results are regarded as promising enough that the 
correlation between group expertise and logic deviation can be drawn. After tho- 
rough consideration hypothesis 10 is therefore confirmed, and will be discussed 
in more detail in chapter 7. 


6.12 Gender Effects 


While no significant differences regarding performance between female and male 
agents in NPS was measured (Chlupsa & Strunz, 2019; Strunz & Chlupsa, 2019), 
which even held true for all country-origins (Strunz, 2019), adaption efficiency 
to more effective strategies had shown gender effects in behavioral experiments 
(Casal et al., 2017). 

Hypotheses that potentially relate to strategy adaption efficiency are analyzed 
for gender effects. It is hypothesized that no significant gender effects will be 
found at all, as NPS performance, free of gender effects, is regarded as most 
fundamental for all forms of strategy adaption. 

All 87 participants consisted of self-reported 29 female and 58 male partici- 
pants. 

Boxplot figure 6.9 shows that no significant gender effect testing hypothesis 1 
seems to be visible. 

Strategy adaption efficiency during well-defined stages is implicitly expressed 
by ToH expertise. As agents who fail to adapt their strategy during the well- 
defined stages to the new goal rod position will have a lower chance of falling 
into the high or medium expertise category. 

Spearman’s rho shows significant correlation at the 0.01 level between exper- 
tise and well-defined logic proportion (ToH total) for all 29 female participants. 
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Spearman’s rho shows significant correlation at the 0.01 level between exper- 
tise and well-defined logic proportion for all 58 male participants. Therefore, no 
gender effect was found for hypothesis 1. 
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Figure 6.9 Boxplot graph showing no gender effect between expertise and well-defined 
logic proportion: 1 = female, 2 = male. Source own source 


Analyzing hypothesis 2 for gender effects, not ideal first moves proportion by 
female participants during stage 1 was identical with not ideal first moves during 
the stage 4 (44,83 %), where this performance does not significantly differ from 
the mean (sum of rel. not ideal divided by 6) of overall not ideal first moves 
(41.38 %), with p = 0.174. The results are summarized in table 6.2. 

Not ideal first moves by male participants during stage 4 reached their maxi- 
mum (54,39 %), which differed from the mean from not ideal first moves 
(42.00 %) at the 0.05 level with p = 0.013. The results are summarized in 
table 6.3. Female participants outperformed male participants regarding strategy 
adaption with goal rod changes during well-defined stages. Not ideal first move 


146 6 Results 


proportions are marked bold at game stage 4, where the goal rod change takes 
place and the former strategy has to be adapted efficiently. 


Table 6.2 Impact of “macrostructure shift” on female decision-making performance. Source 
own source 


female |ToH game | ToH game | ToH game | ToH game 4 | TOH game | ToH game 

1 2 3 (goal rod 5 6 
change) 

not 13 13 9 13 12 12 

ideal 

ideal 16 16 20 16 17 17 

total 29 29 29 29 29 29 

rel. not |0,448276 |0,448276 |0,310345 | 0,448276 0,413793 | 0,413793 

ideal 

rel. 0,551724 |0,551724 |0,689655 | 0,551724 0,586207 | 0,586207 

ideal 


Whether or not a gender effect was found for hypothesis 2 is debatable, as sam- 
ple sizes differ greatly and are limited in their statistical validity. For both sexes, 
a global or local maximum of not ideal first moves was reached during stage 
4. However, numbers have shown that female participants outperformed male 
participants regarding adaption to a “sudden” goal rod change, which required 
immediate, effective and efficient change of strategy. 

This results suggest that, contrary to the findings of Casal et al. (2017), there 
can be particular cases where female participants are more likely to adapt their 
strategy efficiently although this result must be considered cautiously since the 
small sample size of the female group in this experiment. Whether or not this 
observation was enough to be regarded as a gender effect required further analysis, 
perhaps by inclusion of reflection times and greater sample sizes. 

Analyzing for gender effects in hypothesis 3, logic deviation proportion results 
for the metastable ill-defined stages are shown in boxplot figure 6.10. 

While deviation does not directly translate to a more efficient strategy, meta- 
stable stages benefit from sticking with well-defined strategies, as the metastable 
stages can be experienced as “well-defined” levels. For female participants, 
Kruskal-Wallis H showed weak significant differences at the 0.1 level (p = 0.091) 
amongst information conditions. 

Differences amongst the information conditions regarding logic deviation in 
the metastable stages were less significant amongst male participants (p = 0.156). 
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Table 6.3 Impact of “macrostructure shift” on male decision-making performance. Source 
own source 


male ToH game | ToH game | ToH game | ToH game 4|ToH game | ToH game 


1 2 3 (goal rod 5 6 
change) 
not 27 23 17 31 23 22 
ideal 
ideal 31 34 39 26 33 34 
total 58 57 56 57 56 56 


rel. not | 0,465517 0,403509 | 0,303571 0,54386 0,410714 | 0,392857 
ideal 


rel. 0,534483 0,596491 0,696429 0,45614 0,589286 0,607143 
ideal 
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Figure 6.10 Logic deviation during metastable ill-defined stages: 0 = N-IC, 1 = G-IC, 2 
D-IC, 3 = R-IC, 4 = C-IC, and regarding sex: 1 = female, 2 = male. Source own source 
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Mann-Whitney U shows no significant differences between female and male 
deviation distances in metastable stages (p = 0.401). 

As Mann-Whitney U shows no significant differences between female and 
male deviation distances amongst all ill-defined stages (p = 0.543), hypothesis 4 
is not analyzed in further detail. 

Regarding hypothesis 5, expecting a positive relationship between expected 
states proportions and logic deviation proportions, for both female and male par- 
ticipants, all expected states indices and all logic deviations indices correlated at 
the 0.01 significance level without exception. Figure 6.11 shows boxplot results 
of expected states proportion for all ill-defined stages. 
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Figure 6.11 Expected states proportion during ill-defined stages: 0 = N-IC, 1 = G-IC, 2 
D-IC, 3 = R-IC, 4 = C-IC, and regarding sex: 1 = female, 2 = male. Source own source 


Mann-Whitney U does not show significant differences regarding any expected 
state proportion (ToE X tot: p = 0.746, ToE X 1: p = 0,438, ToE X 2: p = 0,759). 
Therefore, no significant gender effects were found for hypothesis 5. Regarding 
the logic marker analysis for hypothesis 6, Mann-Whitney U shows no significant 
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difference regarding “strategy randomness” between sexes (p = 0.389). Boxplot 
figure 6.12 shows logic marker results for all information conditions. 
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Figure 6.12 Logic marker results during ill-defined stages: 0 = N-IC, 1 = G-IC, 2 = D-IC, 
3 = R-IC, 4 = C-IC, and regarding sex: 1 = female, 2 = male. Source own source 


To avoid confusion, it should be noted that the higher the logic marker index 
results, the more random a participant behaved. Spearman’s rho results are as fol- 
lows: For female participants, expected states index results of all ill-defined stages 
(ToE X tot) correlated at the 0.01 level with logic marker results; expected states 
index results of metastable ill-defined stages (ToE X 1) correlated at the 0.05 level 
(p = 0.023) with logic marker results; expected states index results of instable ill- 
defined stages (ToE X 2) correlated at the 0.01 level with logic marker results. 
Results for male participants were slightly different. For male participants, corre- 
lation between expected states indices and logic marker results were significant 
at the 0.05 level for all ill-defined stages (ToE X tot, p = 0.017) and for instable 
ill-defined stages (ToE X 2, p = 0.024), but failed to show significant correlation 
for metastable ill-defined stages in isolation (ToE X 1, p = 0.397). 
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Therefore, small differences between female and male participants regarding 
“randomness” in their behavior was found during the metastable ill-defined sta- 
ges. It seems that random behavior during metastable ill-defined stages are less 
explainable by (supposedly) personal expectation amongst male than amongst 
female participants. However, since all “random” logic forms are not framed 
by the experiment’s model, influences stemming from other sources cannot be 
excluded and are, in fact, unknown. Thus, whether this was a true gender effect 
remains, at least, uncertain for hypothesis 6. 

As described above, no significant differences between sexes regarding expec- 
ted state proportion was found. Gender effects for hypothesis 7 are therefore 
disregarded. 

As for hypothesis 8, routine consistency does not differ significantly between 
sexes according to Mann-Whitney U (p = 0.732). Boxplot figure 6.13 shows 
routine consistency (strategy volatility marker 1) of both female and male agents 
over all conditions. 

Minor differences can be seen in the C-IC, however, whether or not this diffe- 
rence is related to gender cannot be clearly derived, especially as this information 
condition is the most complex with regards to public information content. In 
addition, the boxplot graphic does not differentiate between different ill-defined 
system states, being metastable and instable. 

Thus, no significant gender effects were assumed for hypothesis 8. 

For hypothesis 9, both fundamental index and game group performance were 
considered. However, game group performance cannot be analyzed, as raw data 
does not offer a reliable way to filter successfully solved stages. However, the 
fundamental index implicitly relates to the proportion of some group having used 
an effective strategy. From 29 game groups, 2 game groups were female only, 10 
game groups were male only and 17 game groups were mixed with female and 
male participants. Female-only game group with game group ID 65 was part of 
the N-IC and female-only game group with ID 68 was part of the R-IC condition. 
While no correlation between information condition and results of fundamental 
index was found (Spearman’s rho of p = 0.429), female and male only groups 
are sorted by conditions first. 

Results for female-only game group with ID 65 (N-IC) showed that 32 % of 
all game group actions collectively agreed upon, which disk to move. 

Results for female-only game group with ID 68 (R-IC) showed that 95 % of 
all game group actions collectively agreed upon, which disk to move. 

From the 10 male-only game groups, game group 15 and game group 35 were 
part of the N-IC conditions. Male-only game group 43 was part of the R-IC 
condition. 
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Figure 6.13 Routine consistency results during ill-defined stages: 0 = N-IC, 1 = G-IC, 2 
D-IC, 3 = R-IC, 4 = C-IC, and regarding sex: 1 = female, 2 = male. Source own source 


Results for male-only game groups (in N-IC) showed that 59 % (game group 
15) and 85 % (game group 35) of all game group actions collectively agreed upon, 
which disk to move. 

Results for male-only game group 43 (R-IC) showed that 92 % of all game 
group actions collectively agreed upon, which disk to move. 

Kruskal-Wallis H showed no significant difference between mixed, female- 
only and male-only results regarding fundamental index (p = 0.602). Fundamental 
index average of mixed groups was 0.7506 (SD = 0.1950), average of female- 
only groups was 0.6350 (SD = 0.3451), average of male-only groups was 0.7810 
(SD = 0.2048). Figure 6.14 shows boxplot results of fundamental indices. 
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Figure 6.14 Fundamental index results for mixed sexes (0), female-only (1) and male-only 
(2) game groups. Source own source 


Therefore, no significant gender effect regarding hypothesis 9 was found. The 
final hypothesis 10 considers group expertise. Kruskal-Wallis H shows no signi- 
ficant differences between mixed, female-only and male-only groups regarding 
group expertise (p = 0.720). Figure 6.15 shows boxplot results for group expertise 
in mixed, female-only and male-only game groups. 

Gender effects for hypothesis 10 regarding correlation between group expertise 
and logic deviations were tested for mixed-gender, female-only and male-only 
game groups. This analysis was done without considering different information 
conditions, as this was not considered to be relevant for gender effects analysis. 

For mixed-gender groups Spearman’s rho correlation between group expertise 
and all ill-defined logic proportions (ToE tot) was significant at the 0.05 level 
(p = 0.011). For the two female-only groups Spearman’s rho showed signifi- 
cance at the 0.05 level (p = 0.017). For the ten male-only groups Spearman’s rho 
showed significance at the 0.05 level (p = 0.014). Therefore, gender effects are 
disregarded for hypothesis 10. A detailed discussion follows in chapter 7. 
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Figure 6.15 Group expertise results for mixed sexes (0), female-only (1) and male-only (2) 
game groups. Source own source 
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Discussion 


This chapter discusses the empirical results, adds additional results, and compa- 
res derived insights to other scientific conclusions from the domain of behavioral 
economics. The first subchapter sums up understandings of agent behavior by the 
results of the various hypotheses, and includes further results from statistical ana- 
lyses. The second subchapter discusses strengths, weaknesses, opportunities and 
threats of the scientific methods used. The third subchapter provides an overview 
of all limitations, and the fourth subchapter suggests potential methodological 
variations and recommendations for future research. 


7.1 Discussion of Experimental Results 


Evaluating individual expertise of the well-defined problem “Tower of Hanoi” 
by the number of “perfectly solved” games, and filtering by “failing not more 
than one game” has proven to categorize participants very reliably by their logic 
deviation. This is not only true for the well-defined problem-solving stage. For 
the ill-defined problem solving stages, where ToE has to be played, Kruskal- 
Wallis-H shows significant differences by individual expertise regarding ToE total, 
(H(33, 16, 38) = 7,775, p = 0.021) and regarding ToE parts1, (H(33, 16, 38) = 
10.692, p = 0.005). The individual expertise difference only fails to show clear 
significant differences in the “chaotic” ill-defined stages, (H(33, 16, 38) = 4.526, 
p = 0.104). Still, overall the expertise categorizes show significant difference in 
the ill-defined stages. Correlation of expertise with all ill-defined logic proportions 
shows significance at the 0.01 level for ToE total, with Spearman-Rho p = 0.005, 
shows significance at the 0.01 level for the “metastable” ill-defined stages ToE 
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partsl!, with Spearman-Rho p = 0.001, and shows significance at the 0.05 level 
for the “chaotic” ill-defined stages ToE parts2, with Spearman-Rho p = 0.038. 
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Figure 7.1 Boxplot results of logic proportion during „ill-defined“ stages over expertise 
levels. Source own source 


Agents with higher expertise in the well-defined problem-solving stages also 
behaved less “random” in the ill-defined stages, at least from the perspective of 
the methodological model. Kruskal-Wallis-H shows highly significant differences 
regarding logic marker proportions amongst the expertise levels, with (H(33, 16, 
38) = 18.835, p = 0.000), and Spearman-Rho correlation between well-defined 
problem solving expertise and logic marker proportions proves to be significant 
at the 0.01 level, with p = 0.000. The logic marker is an index representing the 
proportion of ToE actions of an agent, which do not fall inside a known logic 
category. In addition, as shown in figure 7.1, the higher the expertise levels, the 
more actions during the ill-defined stages conform to the routine logic. Expertise 
levels are measured by skillful puzzle-solving of well-defined ToH stages, where 
the routine strategy is defined. The ToE tot variable represents the proportion of 
actions, which are part of the routine strategy. In other words, the higher indivi- 
dual expertise in the well-defined stages, the less participants seem to leave their 
routine strategy path during ill-defined stages. 
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Therefore, problem solving expertise, as is measured in this thesis, not 
only relates to well-defined problem-solving performance, but also to ill-defined 
problem-solving behavior. Agents with high well-defined problem-solving exper- 
tise deviated less from their routine strategy and also behaved less random during 
the ill-defined problem-solving stages. 
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Figure 7.2 Boxplot results of logic marker proportions over expertise levels. Source own 
source 


This is shown in figure 7.2, as higher individual expertise levels led to less acti- 
ons by participants, which were not part of any category and are thus considered 
“random” actions. This correlation is shown by the logic marker variable, which 
represents the proportion of actions, which do not fall inside known logic cate- 
gories, and expertise, which represents skill-full puzzle-solving of well-defined 
ToH stages. In other words, the higher individual expertise in the well-defined 
stages, the less random individuals behaved during ill-defined stages. As expec- 
ted, the environmental change of the goal rod position influenced well-defined 
problem-solving performance significantly. Individual expertise can be linked to 
these agents, who did not fall for the goal rod change, and immediately shifted 
their routine strategy. From 33 low expertise agents, only 8 managed to start ToH 
level 4 with an ideal action. From 16 medium expertise agents, only 4 mana- 
ged to start ToH level 4 with an ideal action. From 36 high expertise agents, 30 
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managed to start ToH level 4 with an ideal action. Those who do a mistake at the 
first move at ToH level 4, where the goal rod was changed, are more likely to be 
found in the “low” or “medium” expertise categories. Individual expertise signi- 
ficantly correlates with agents avoiding this mistake at the first action at level 4. 
Spearman-Rho shows the 2-sided correlation between expertise and this mistake 
to be significant at the 0.01 level, with p = 0.000, and Mann-Whitney-U shows 
the differences in expertise between agents who did the mistake and agents who 
did not to be highly significant, with (U(45, 42) = 436.000, z = —4.673, p = 
0.000). 

During metastable stages, Kruskal-Wallis H showed expected states deviation 
to differ significantly at the 0.1 level (p = 0.063) amongst the 5 information 
conditions during metastable conditions (ToE X parts 1), as shown in figure 7.3. 
This shows that agent experience regarding feedback was different, depending 
on the information conditions—yet, expertise remained a reliable predictor of 
consistent behavior. 
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Figure 7.3 Boxplot results of expected states proportion during „metastable“ stages over 
information conditions: 0 = N-IC, 1 = G-IC, 2 = D-IC, 3 = R-IC, 4 = C-IC. Source own 
source 


This insight adds another important property to the significance of the expertise 
categories. Agents with high expertise were significantly more likely to adapt to 
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visual environmental change, which influences their strategy performance, than 
agents with medium or low expertise. 

Regarding all logic proportion analyses, behavior in the routine logic deviation 
was most surprising. Agents did not, as anticipated, deviate strongly from their 
routine strategy, and did, in fact, more or less stick to their routine strategy. It was 
rather the behavior in the no information condition and dissolution information 
condition that fulfilled the behavior that was thought to be measured in the routine 
information condition. Therefore, all anticipated orders of logic proportions were 
roughly observed to be turned “upside down”. 

Significant difference in routine logic proportion was found during the meta- 
stable ill-defined stages, where behavior in the routine information condition has 
proven to deviate least from its routine logic, while behavior in the dissolution 
information condition deviated the most. When logic proportions were analyzed 
over all ill-defined stages, including the “chaotic” stages, this statistical signifi- 
cance vanished. Differences in routine proportions were especially insignificant, 
when only the “chaotic” ill-defined stages are observed, with Kruskal-Wallis H 
(H(18, 24, 15, 15, 15) = 1,440, p = 0.837). 

The proportion of individual experienced expected outcome was shown to 
correlate with individual logic proportions at the 0.01 level. Also differences in 
experienced expected outcome proportions only differed amongst the information 
conditions in the metastable ill-defined stages (figure 7.3) with weak significance 
(p = 0.063). Overall differences between the information condition regarding 
experienced logical feedback were not significant (p = 0.312), especially during 
the “chaotic” stages (p = 0.423). In other words, all agents experienced compara- 
ble level of “chaotic feedback” and did not differ too much in their behavior. Only 
during the metastable ill-defined stages, meaningful statements can be made regar- 
ding behavior and experience. Here, behavior in the routine information condition 
deviated least from its routine strategy, and feedback was the least “chaotic”. 
During ill-defined and instable ToE stages, no significant difference in deviation 
from routine strategy (ToE parts 2) amongst information conditions was found, as 
shown in figure 7.4. In other words, agent behavior regarding logic deviation was 
comparable during stages that provided more chaotic feedback. 

Random agent behavior, expressed by a high logic marker, did not differ 
amongst conditions significantly, with Kruskal-Wallis H being (H(18, 24, 15, 15, 
15,) = 5.714, p = 0.222), but was shown to correlate with experiencing “‘chao- 
tic” feedback amongst all ill-defined stages. As chaotic feedback was comparable 
amongst all conditions, this result was no surprise. In addition, routine consis- 
tency did not differ significantly amongst the information conditions as well. 
Routine consistency described how many actions performed were following the 
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Figure 7.4 Boxplot results of logic proportion during „instable“ stages over information 
conditions: 0 = N-IC, 1 = G-IC, 2 = D-IC, 3 = R-IC, 4 = C-IC. Source own source 


routine strategy category. High individual expertise was found to significantly cor- 
relate with low random behavior, and is also found to correlate at the 0.01 level 
with high routine consistency, with Spearman-Rho of 0.002. Difference in rou- 
tine consistency proportions amongst individual expertise was found to be highly 
significant, with Kruskal-Wallis-H (H(33, 16, 38) = 9.844, p = 0.007). 

The higher individual expertise in well-defined problem solving, the more rou- 
tine strategy actions were performed or in other words, the higher individual 
expertise the higher the routine consistency, as can be seen in figure 7.5. 

Game-group performance was found to rely heavily on agents agreeing which 
disk to move, which enhances the chances to beat randomness significantly. In 
order to know how many moves were required to solve ToE when actions are 
being chosen randomly, five bot groups played 6 ill-defined ToE settings, with 
the goal rod changing at the fourth level, just as in the main experiment. The bot 
groups required more than 166 steps on average to solve a ToE game with the 
goal rod positioned at the center, and more than 113 steps on average to solve a 
ToE game with the goal rod positioned right. The minimum number of steps sol- 
ving any ToE stage randomly was 25, the maximum number of steps solving any 
ToE stage randomly was 727. The bots required more than 139 steps on average 
to solve any ToE stage. At the time of measurement, the bot game group was 
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implemented in such a way that all three bots would have the identical random 
input, therefore always having a fundamental index of “1”. For this reason, the bot 
groups did not behave perfectly random, as all bots agreed on disk and distance. 
From all 29 groups observed, only two groups did not outperform randomness, 
requiring more than 139 steps to solve all ToE stages. Due to unreliable variables 
it was unclear which game group managed to finish a ToE stage due to solving it 
properly in time or failing to solve it in time. Time in seconds required per game 
was saved, but also deemed unreliable. For this reason no statement about group 
performance can be made. 

Correlation between group expertise and ToE logic proportions was signifi- 
cant at the 0.05 level for the N-IC and significant at the 0.01 level for the G-IC. 
Analysis with Kruskal-Wallis H was significant in all but the C-IC condition. Sta- 
tistical analysis has shown enough potential correlations between group expertise 
and logic deviations to confirm hypothesis 10. 

Gender effects were tested in detail and while some small deviations between 
female and male behavior were found, but in general, the existence of convin- 
cing gender effects was disregarded. Some small differences between goal rod 
change strategy adaption performances were found, where female participants 
outperformed. Random behavior by female participants was more framed by the 
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experiment’s model than was male behavior. Aside from these two minor diffe- 
rences, gender effects are not visible. This is in line with research regarding NPS 
performance, where no gender effects were visible for any age or country-origin 
(Strunz, 2019; Strunz & Chlupsa, 2019). 

After thorough analyses the most promising independent variable was indivi- 
dual expertise. Agents with high expertise not only performed well during the 
well-defined problem-solving stages, adapting their strategy instantly to environ- 
mental change, but showed less routine logic deviation in the ill-defined stages, 
and behaved less random and volatile solving ill-defined games. 


7.2 Methodological Analysis 


The transfer from offline to online experimental analysis was a success, as inter- 
personal communication between agents was avoided. In addition, the online 
functionality enabled experiments to be done in a matter of minutes. Experiments 
running on CuriosityIO can be modified quickly if required. CuriosityIO enables 
live-observation of each agent. By implementing bots and time limitations, and 
kicking inactive players automatically, ethical payment was preserved, as agents 
played 31 minutes on average for a 6.10 USD pay. It took dozens of iterations 
to structure the multi-agent experiment in such a way that average completion 
time could be anticipated. As a safe-line, Amazon Mechanical Turks should be 
informed that submitting incomplete data would not lead to a rejection if a certain 
threshold of time was exceeded, in this case, 50 minutes. Otherwise MTurks tend 
to rather cancel the experiment without submitting the data, in order to avoid 
rejection. For MTurks the rejection rate is more important than financial loss, 
as the HIT rejection rate is the most common filter for experiments on Amazon 
Mechanical Turk, and usually lies between 95 and 99 %. When a large expe- 
riment fails due to a server crash for example, it is better to have MTurks to 
submit incomplete data quickly, as compensation of MTurks who did not submit 
their data comes along with problems. In such cases, individual “fake” experi- 
ments or “compensation HITs” have to be started for each agent. This can lead to 
huge organizational work. MTurks who failed to submit due to the server crash 
with 330 MTurks participating were partially compensated via paypal, however, 
paying MTurks via paypal is a violation of Amazon Mechanical Turk’s terms of 
services. Also, live support via email during large online multi-agent experiments 
is mandatory. Participants need to be answered with a response time less than 
2 minutes in order to make them feel guided. Many questions arise during all 
online experiments, leading to dozens and hundreds of emails to be answered in 
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very short timeframes. The experimenter should prepare experiments accordin- 
gly to avoid being overwhelmed by organizational work due to compensation or 
support requirements. As experimenters are being rated online and MTurks are 
well connected, experimenter should take ethical payment and sound experiment 
structure seriously. 

All in all, the way Amazon Mechanical Turks works deems to be not ideal for 
conducting multi-agent experiments under uncertainty. In order to avoid bots or 
low-quality data, the HIT rejection rate should be greater than or equal to 99 %. 
However, with such a high HIT rejection rate not enough participants will join 
in a short time span. Many agents had to join in a short amount of time, so that 
a game group was not automatically filled with a bot, in order to gain enough 
meaningful data. A bot had to be implemented, so that MTurks would not have 
to wait longer than a couple of minutes until the experiment started. This was 
mandatory for ethical payment, as for any HIT the time limit for a participant has 
to be pre-set. If a participant fails to finish a HIT (paid task like this experiment) 
in that pre-set amount of time, the MTurks will not be able to submit and the 
experimenter has a hard time to compensate. However, pre-setting the number of 
minutes is mandatory in order for the MTurks to calculate and anticipate their 
earnings. When the HIT rejection rate is lower than 99 %, the experimenter risks 
lower individual quality data, but enables more participants to join in a short 
time span. When the HIT rejection rate is lower, paradoxically data quality rises 
for this particular experiment, as more data becomes meaningful, but with a too 
low HIT rejection rate, individual data quality becomes less valuable. For the 
main experiment, a HIT rejection rate of “greater than 95 %” was chosen, and 
it is recommended that the experimenter takes into consideration the perspectives 
of the MTurks, via online communication channels such as “Reddit”. Here the 
author gained enough insight by MTurks to find the ideal HIT rejection rate for 
the experiment. 

Even though there exist many studies about the behavior and data quality gai- 
ned by conducting experiments with MTurks, not much can be known about 
each participant in reality. More information about each individual MTurk had 
to be obtainable for the experimenter for higher quality experiments. An additio- 
nal feature that enhanced data quality would be an online lobby, where MTurks 
could idle without losing time and money. Such features would have to be imple- 
mented for multi-agent experiments to be more effective, ethical and efficient. As 
most freelancers working with Amazon Mechanical Turk are mostly either from 
India or US-America, alternatives to Amazon Mechanical Turk should be regar- 
ded, if participants from e.g. Europe were required. The Amazon Mechanical Turk 
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business model gains popularity, and many alternatives are currently being deve- 
loped, which offer more information about the individual freelancers, and also 
enough European participants for more diverse country-origin experiments. 

As response times are valuable predictors for behavior, online experiments 
should run on stable infrastructure, in order to ensure the saved response times 
to not be erroneous. Even after one year of optimizing both infrastructure and 
software performance, response times deemed not to be reliable enough to make 
statistically meaningful analyses. In addition, any server running multi-agent 
experiments should be equipped with way more memory capacity than anticipa- 
ted to be required. While it was suggested that a server holding 1 GiB of working 
memory would certainly suffice for an experiment with 330 agents, in reality, the 
server with such a setup crashed. Even a 32 GiB working memory server showed 
a CPU load of 55 %, while calculating an experiment with only 180 agents. The 
author recommends at least 128 GiB working memory for experiments with a 
4-digit number of participants. In addition, at least one stress test with a couple 
of hundred non-simulated participants should be conducted beforehand. 


7.3 Limitation 


Participants were confronted with the cognitive puzzle game “Tower of Hanoi”, 
and its multiplayer version “Tower of Europe”. As for some participants this 
puzzle game might be an undefined or well-defined task from the very beginning, 
ex-ante expertise can lead to a fast learning curve in the well-defined problem- 
solving stages. In addition, some participants self-reported having encountered the 
experiment before, and might have had some a-priori knowledge. However, none 
of the participants who self-reported having encountered the experiment before 
could have been playing the ill-defined stages, as the first experiment, with 1 
GiB server memory, crashed right after the well-defined stages. Still, statistical 
analysis did not treat these participants differently during the second successful 
experiment, which was equipped with 32GiB of memory. 

After the well-defined stages, the experiment makes the transition to an ill- 
defined problem, with the first three games being “metastable” and the last three 
games representing “chaotic” decision-making circumstances. The order of the 
experiment’s problem-solving stages, being well-defined, ill-defined and metasta- 
ble, ill-defined and chaotic, models real world experiences and challenges, but is 
also a limitation in itself, as in real world decision-making any order of problem 
categories or decision system states might occur. 
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Participants received information on the outcome of their individual action, 
but no further details about how the hidden ruleset works, i.e. how their decision 
influenced the outcome. Participants therefore received simple feedback and not 
rich feedback; therefore, learning was limited. 

No analysis including response times was conducted, due to yet unreliable 
data. Group performance could not be evaluated due to yet missing variables that 
could clearly indicate, whether an ill-defined stage was solved via performing the 
right actions. Some statistical evaluations would clearly benefit from a larger pool 
of participant data; however, software efficiency and stability had to be twea- 
ked further to enable experiments with more than one thousand participants. It 
is estimated that in order to derive insights with sound, statistical analyses about 
inter-group differences with five conditions, at least 2.700 participants would be 
required for nonparametric statistics. With a data dropout rate of about 50 %, par- 
ticipants should be in the thousands in order to ensure data quantity. This thesis 
relies on 87 data points derived from a pool of 180 participants; therefore, all 
insights are limited in their statistical validity. 


7.4 Future Outlook 


Tower of Hanoi experiments are both thoroughly researched and used for beha- 
vioral experiments. Flag Run and Tower of Europe are experimental novelties, 
which might benefit from scientific insights regarding insight problem solving, 
working memory capacity, cultural uncertainty avoidance, and from conducting 
the experiment with different models of learning environments. Multiple learning 
environments could be simulated via altering the content of the instruction or 
implementing rich feedback. Experiments which differ in their visual represen- 
tation, yet still run on the identical logic of Flag Run or Tower of Europe, such 
as an interactive stock exchange game, could be designed. Interpersonal commu- 
nication can be included via chat windows, holding a list of certain predefined 
text-passages, which can be chosen from. The algorithm ensures that in any case, 
a multi-agent group decision making domain is created, where each individual 
decision influences the outcome, while not necessarily having impact on the group 
decision output. The algorithm is fair, unbiased, and even if its rules are known, 
it can only be taken advantage of, when agents can agree on their order of action 
input; in other words, when agents were able to synchronize their actions. Howe- 
ver, the algorithm can be set arbitrarily complex, so that even if communication 
between agents was enabled, and agents would communicate their order of inputs, 
they would not be able to take full control over the outcome. Therefore, stable, 
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metastable and chaotic decision-making environments can be easily simulated. 
An arbitrary number of agents per group can be used, and the algorithm can also 
be used for games with multi-dimensional decisions. 
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Conclusion 8 


The aim of this thesis was to create a decision-making domain, in which multiple 
agents would collectively engage a problem, without being able to communicate 
with each other. Furthermore, the group decision making was structured such 
that each agent would always have an influence over the outcome, but could 
not control the impact of their decisions. Different information conditions simu- 
lated information asymmetries, from which potential behavioral changes were 
to be analyzed. Agents were able to build up expertise in a well-defined lear- 
ning environment, and later engaged in an ill-defined, metastable and instable 
decision-making domain, which was either dominated by seemingly determini- 
stic or chaotic feedback. In order to create a problem under uncertainty, “Tower 
of Hanoi” was chosen as the problem for analysis, which lacks any numerical 
representations, and thus further avoids subjective or even objective probabilities 
being built up by human mental models. Different variations of logic, strategies, 
and feedback were examined in order to derive as much information as possible 
in this group decision making experiment. The core idea was that this experiment 
represents reality, where an agent would first gain experience and learn about the 
systematics of a market, (e.g. by visiting a business school), engaging in well- 
defined problems. Upon having gained some expertise, which varies amongst the 
agents, they could then explore the real world, and solve ill-defined problems with 
their expert knowledge. Real world problems were first simulated as metastable, 
changing to a more chaotic problem afterwards. Many economic decisions are 
taken without communicating directly with all shareholders- or stakeholders, and 
agents collectively solve ill-defined problems, with each agent having different 
sets of information, and different strategies and ideas about the “hidden rules” of 
some market or complex decision-making domain. From this idea, five different 
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research questions lead to 10 different hypotheses. The first research question pro- 
bed, whether public information about environmental change would necessarily 
lead to a change of behavior, when the new environmental conditions would not 
have an impact on some strategy’s performance. The routine information group 
has proven that this was not the case. Agents in the routine information group 
stuck to their routine strategy from the well-defined problem-solving stages during 
the metastable condition. The second research questions asked, whether change 
in behavior was the case if environmental change actually does have an impact on 
some strategy’s performance; here, individual expertise has proven to be a strong 
predictor, of whether or not an agent was able to adapt or stick to an effective 
routine strategy. High expertise lead to less random and volatile behavior in the 
ill-defined problem-solving stages, and enabled agents to adapt quickly to envi- 
ronmental conditions in well-defined stages. The third research question regarded 
deviation from routine strategy when different types of information, their con- 
tents being truthful and deception-free, were provided. Here, results were not so 
clear, and individual expertise was certainly a stronger predictor than was pub- 
lic information. The fourth research question can also be answered by focusing 
on individual expertise, rather than on public information: the higher the indivi- 
dual expertise in the well-defined problem-solving domain, the higher the chances 
were that participants would maintain an effective routine strategy or adapt their 
routine if necessary. While public information did not significantly influence the 
overcoming of parts of a routine strategy, it seems that the dissolution informa- 
tion group deviated the most. Perhaps public information about the individuals 
being unable to obtain helpful information about the hidden rules discouraged 
agents, favoring random behavior or absorbed individual motivation to engage in 
problem-solving with smart heuristics. Further research on the influence of public 
information that favors a belief of lack of control could shed light on this ass- 
umption. The fifth and final research question was partially answered. Individual 
expertise in the well-defined problem-solving stage showed a strong significant 
correlation with behavior in the ill-defined stages. While the experiment failed to 
come to conclusions about group performance, the role and impact of individual 
expertise was surprising, truly holding more predictive power regarding group 
decision-making than public information. 

All hypotheses were analyzed in detail for gender effects and no convincing 
differences in behavior between female and male participants led to the ass- 
umption that gender effects were found. Just as for NPS performance, where 
no gender effects were found for any age or country-origins, such as US Ame- 
rica, India and Germany, solving ToE in a smart and intuitive way disregarded 
gender effects. If anything, female participants outperformed in strategy adaption 
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during well-defined stages, which was a crucial part to rank high in individual 
expertise. Obviously, this result is very favorable for the idea of inclusion in 
modern workspaces, where NPS performance and smart decision-making under 
uncertainty will play an ever-growing role. 

Expert knowledge could be the key factor for global and interconnected 
problems, where interpersonal communication is impossible or vastly limited. 
Identifying the ideal decision-making positions for experts through quick and 
effective online experiments could lead to less volatile, less chaotic system 
performance, from which all decision-makers can profit. 
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